Compare commits

..

4 Commits

Author SHA1 Message Date
Karthik Satchitanand 7b40e3a1df
[Cherry-Pick for v1.8.2] (#157)
* refactor(events): removing unnecessary event (#139)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

* chore(ipfilter): Adding ipfilter in network chaos for containerd/crio (#140)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

* (fix)containerKill: Fix the container kill experiment for containerd runtime (#142)

Signed-off-by: Udit Gaurav <udit.gaurav@mayadata.io>

* chore(log-mgmt): Refactoring code for better log/error mgmt (#141)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

* chore(actions): Add GitHub action custom test to test a perticular PR (#143)

Signed-off-by: Udit Gaurav <udit.gaurav@mayadata.io>

* Fix typo (#147)

Signed-off-by: Moti Asayag <masayag@redhat.com>

* Refactor/Modify travis and makefile to add linting and formating code (#136)

Signed-off-by: Udit Gaurav <udit.gaurav@mayadata.io>

* (chore)action: Fix the actions for merge (#149)

Signed-off-by: Udit Gaurav <udit.gaurav@mayadata.io>

* (chore)dockerfile: Optimize Dockerfile (#150)

Signed-off-by: Udit Gaurav <udit.gaurav@mayadata.io>

* chore(abort): Adding support for abortion in all experiments (#145)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

* chore(probe): updating probes status in abort case (#154)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

* fix(parelle;-execution): Adding parellel execution in pod cpu/memory-hog experiments (#152)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

Co-authored-by: Shubham Chaudhary <shubham.chaudhary@mayadata.io>
Co-authored-by: UDIT GAURAV <35391335+uditgaurav@users.noreply.github.com>
Co-authored-by: Moti Asayag <masayag@redhat.com>
2020-09-29 18:03:23 +05:30
Shubham Chaudhary 58ffc6a555
fix(pumba-lib): fixing regex for the pumba pod (#135) (#137)
Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>
2020-09-18 17:44:27 +05:30
UDIT GAURAV bfdb4216dd
cherry-pick for 1.8.1 (#134)
* chore(socketPath): correcting socketPath env name (#124)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

* chore(network-chaos): Adding ability to inject network chaos w/ pause container (#126)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

* fix(permission-issue): fixing the username permission issue for sockfile (#129)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

* chore(k8sprobe): Adding label-selector field in k8sprobe (#127)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

* Chore/ Splitting different files for network chaos experiment (#128)

* Chore/ Splitting different variables for network chaos experiment

Signed-off-by: Udit Gaurav <udit.gaurav@mayadata.io>

* fix(userid): revert the userid from dockerfile (#131)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

* fix(duplicate): fixing network duplicate exp (#132)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

* chore(network-chaos): splitting network chaos for containerd (#133)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

Co-authored-by: Shubham Chaudhary <shubham.chaudhary@mayadata.io>
2020-09-18 10:26:43 +05:30
UDIT GAURAV 4c62402cd0
[Cherry Pick for 1.8.0] (#122)
* Add support to specify network chaos targets as hostnames (#110)

* Support specifying target hosts with pumba lib while performing network chaos experiments

Signed-off-by: piyush0609 <sinha.piyush0609@gmail.com>

* chore(abort-event): Adding abort events for network-chaos (#121)

Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>

* fix(Dockerfile): Add a non root user to run the process in Dockerfile (#120)

Signed-off-by: Udit Gaurav <udit.gaurav@mayadata.io>
2020-09-15 19:40:41 +05:30
2415 changed files with 751941 additions and 42997 deletions

5
.github/CODEOWNERS vendored
View File

@ -1,5 +0,0 @@
# Lines starting with '#' are comments.
# Each line is a file pattern followed by one or more owners.
# These owners will be the default owners for everything in the repo.
* @ispeakc0de @ksatchit @uditgaurav

View File

@ -1,27 +0,0 @@
<!-- This form is for bug reports and feature requests ONLY! -->
<!-- Thanks for filing an issue! Before hitting the button, please answer these questions.-->
## Is this a BUG REPORT or FEATURE REQUEST?
Choose one: BUG REPORT or FEATURE REQUEST
<!--
If this is a BUG REPORT, please:
- Fill in as much of the template below as you can. If you leave out information, we can't help you as well.
If this is a FEATURE REQUEST, please:
- Describe *in detail* the feature/behavior/change you'd like to see.
In both cases, be ready for followup questions, and please respond in a timely
manner. If we can't reproduce a bug or think a feature already exists, we
might close your issue. If we're wrong, PLEASE feel free to reopen it and
explain why.
-->
**What happened**:
**What you expected to happen**:
**How to reproduce it (as minimally and precisely as possible)**:
**Anything else we need to know?**:

View File

@ -1,18 +0,0 @@
<!-- Thanks for sending a pull request! Here are some tips for you -->
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Checklist:**
- [ ] Fixes #<issue number>
- [ ] PR messages has document related information
- [ ] Labelled this PR & related issue with `breaking-changes` tag
- [ ] PR messages has breaking changes related information
- [ ] Labelled this PR & related issue with `requires-upgrade` tag
- [ ] PR messages has upgrade related information
- [ ] Commit has unit tests
- [ ] Commit has integration tests
- [ ] E2E run Required for the changes

View File

@ -1,23 +0,0 @@
# Configuration for probot-auto-merge - https://github.com/bobvanderlinden/probot-auto-merge
reportStatus: true
updateBranch: false
deleteBranchAfterMerge: true
mergeMethod: squash
minApprovals:
COLLABORATOR: 0
maxRequestedChanges:
NONE: 0
blockingLabels:
- DO NOT MERGE
- WIP
- blocked
# Will merge whenever the above conditions are met, but also
# the owner has approved or merge label was added.
rules:
- minApprovals:
OWNER: 1
- requiredLabels:
- merge

View File

@ -1,98 +0,0 @@
---
name: Build
on:
pull_request:
branches: [master]
types: [opened, synchronize, reopened]
jobs:
pre-checks:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '1.20'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: gofmt check
run: |
if [ "$(gofmt -s -l . | wc -l)" -ne 0 ]
then
echo "The following files were found to be not go formatted:"
gofmt -s -l .
exit 1
fi
- name: golangci-lint
uses: reviewdog/action-golangci-lint@v1
gitleaks-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Run GitLeaks
run: |
wget https://github.com/gitleaks/gitleaks/releases/download/v8.18.2/gitleaks_8.18.2_linux_x64.tar.gz && \
tar -zxvf gitleaks_8.18.2_linux_x64.tar.gz && \
sudo mv gitleaks /usr/local/bin && gitleaks detect --source . -v
build:
needs: pre-checks
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '1.20'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
with:
platforms: all
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v1
with:
version: latest
- name: Build and push
uses: docker/build-push-action@v2
with:
push: false
file: build/Dockerfile
platforms: linux/amd64,linux/arm64
tags: litmuschaos/go-runner:ci
build-args: LITMUS_VERSION=3.10.0
trivy:
needs: pre-checks
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Build an image from Dockerfile
run: |
docker build -f build/Dockerfile -t docker.io/litmuschaos/go-runner:${{ github.sha }} . --build-arg TARGETARCH=amd64 --build-arg LITMUS_VERSION=3.10.0
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: 'docker.io/litmuschaos/go-runner:${{ github.sha }}'
format: 'table'
exit-code: '1'
ignore-unfixed: true
vuln-type: 'os,library'
severity: 'CRITICAL,HIGH'

141
.github/workflows/guide.md vendored Normal file
View File

@ -0,0 +1,141 @@
# Run E2E tests using GitHub Chaos Actions
- When you commit code to your repository, you can continuously build and test the code to make sure that the commit doesn't introduce errors. The error could be in the form of some security issue, functional issue, or performance issue which can be tested using different custom tests, linters, or by pulling actions. This brings the need of having *Chaos Actions* which will perform a chaos test on the application over a particular commit which in-turn helps to track the performance of the application on a commit level. This can be done by commenting on the Pull Request.
## Through comments on PR
- We can run tests for any desired experiment or set of experiments by just commenting on the Pull Request. The format of comment will be:
```bash
/run-e2e-<test-name/test-group>
```
_Experiments Available for custom bot:_
<table style="width:100%">
<tr>
<th>Resource chaos</th>
<th>Network Chaos</th>
<th>IO Chaos</th>
<th>Others</th>
</tr>
<tr>
<td>pod-cpu-hog</td>
<td>pod-network-latency</td>
<td>node-io-stress</td>
<td>pod-delete</td>
</tr>
<tr>
<td>pod-memory-hog</td>
<td>pod-network-loss</td>
<td></td>
<td>container-kill</td>
</tr>
<tr>
<td>node-cpu-hog</td>
<td>pod-network-corruption</td>
<td></td>
<td>pod-autoscaler</td>
</tr>
<tr>
<td>node-memory-hog</td>
<td>pod-network-duplication</td>
<td></td>
<td></td>
</tr>
</table>
### Group Tests
<table style="width:100%">
<tr>
<th>Command</th>
<th>Description</th>
</tr>
<tr>
<td><code>/run-e2e-all</code></td>
<td>Runs all available tests. This includes all resource chaos test, network chaos test, IO test and other tests. It will update the comment if it gets passed.</td>
</tr>
<tr>
<td><code>/run-e2e-network-chaos</code></td>
<td>Runs all network chaos tests. This includes pod network corruption, pod network duplication, pod network loss, pod network latency.</td>
</tr>
<tr>
<td><code>/run-e2e-resource-chaos</code></td>
<td>Runs all resource chaos tests. This includes pod level cpu and memory chaos test and node level cpu and memory chaos test.</td>
</tr>
<tr>
<td><code>/run-e2e-io-chaos</code></td>
<td>Runs all io chaos tests. Currently it only includes node io stress</td>
</tr>
</table>
### Individual Tests
<table style="width:100%">
<tr>
<th>Command</th>
<th>Description</th>
</tr>
<tr>
<td><code>/run-e2e-pod-delete</code></td>
<td>Runs pod delete chaos test using GitHub chaos action which fail the application pod</td>
</tr>
<tr>
<td><code>/run-e2e-container-kill</code></td>
<td>Runs container kill experiment using GitHub chaos action which kill containers on the application pod</td>
</tr>
<tr>
<td><code>/run-e2e-pod-cpu-hog</code></td>
<td>Runs pod level CPU chaos experiment using GitHub chaos action which consume CPU resources on the application container</td>
</tr>
<tr>
<td><code>/run-e2e-pod-memory-hog</code></td>
<td>Runs pod level memory chaos test which consume memory resources on the application container</td>
</tr>
<tr>
<td><code>/run-e2e-node-cpu-hog</code></td>
<td>Runs node level cpu chaos test which exhaust CPU resources on the Kubernetes Node </td>
</tr>
<tr>
<td><code>/run-e2e-node-memory-hog</code></td>
<td>Runs node level memory chaos test which exhaust CPU resources on the Kubernetes Node</td>
</tr>
<tr>
<td><code>/run-e2e-node-io-stress</code></td>
<td>Runs pod level memory chaos test which gives IO stress on the Kubernetes Node </td>
</tr>
<tr>
<td><code>/run-e2e-pod-network-corruption<code></td>
<td>Run pod-network-corruption test which inject network packet corruption into application pod</td>
</tr>
<tr>
<td><code>/run-e2e-pod-network-latency</code></td>
<td>Run pod-network-latency test which inject network packet latency into application pod </td>
</tr>
<tr>
<td><code>/run-e2e-pod-network-loss</code></td>
<td>Run pod-network-loss test which inject network packet loss into application pod </td>
</tr>
<tr>
<td><code>/run-e2e-pod-network-duplication</code></td>
<td>Run pod-network-duplication test which inject network packet duplication into application pod </td>
</tr>
</table>
***Note:*** *All the tests are performed on a KinD cluster with containerd runtime.*
## Merge a Pull Request
- For auto merging, we need to comment `/merge` in the PR which will add a label `merge` in the PR and then finally merge the PR according to the ENVs provided.
_Minimum Number of Approvals:_
- The action will automatically check if the required number of review approvals has been reached. If the number is not reached, it will not merge the PR.
- It will work according to the role of the commenter and branch protection Rule on the repository.

407
.github/workflows/main.yml vendored Normal file
View File

@ -0,0 +1,407 @@
name: LitmusGo-CI
on:
issue_comment:
types: [created]
jobs:
tests:
if: contains(github.event.comment.html_url, '/pull/') && startsWith(github.event.comment.body, '/run-e2e')
runs-on: ubuntu-latest
steps:
- name: Notification for e2e Start
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
**Test Status:** The e2e test has been started please wait for the results ...
****
| Experiment | Result | Runtime |
|------------|--------|---------|
#Using the last commit id of pull request
- uses: octokit/request-action@v2.x
id: get_PR_commits
with:
route: GET /repos/:repo/pulls/:pull_number/commits
repo: ${{ github.repository }}
pull_number: ${{ github.event.issue.number }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: set commit to output
id: getcommit
run: |
prsha=$(echo $response | jq '.[-1].sha' | tr -d '"')
echo "::set-output name=sha::$prsha"
env:
response: ${{ steps.get_PR_commits.outputs.data }}
- uses: actions/checkout@v2
with:
ref: ${{steps.getcommit.outputs.sha}}
- name: Generating Go binary and Building docker image
run: |
make build
#Install and configure a kind cluster
- name: Installing Prerequisites (KinD Cluster)
uses: engineerd/setup-kind@v0.4.0
with:
version: "v0.7.0"
- name: Configuring and testing the Installation
run: |
kubectl cluster-info --context kind-kind
kind get kubeconfig --internal >$HOME/.kube/config
kubectl get nodes
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
- name: Deploy a sample application for chaos injection
run: |
kubectl apply -f https://raw.githubusercontent.com/mayadata-io/chaos-ci-lib/master/app/nginx.yml
sleep 30
- name: Setting up kubeconfig ENV for Github Chaos Action
run: echo ::set-env name=KUBE_CONFIG_DATA::$(base64 -w 0 ~/.kube/config)
- name: Running Litmus pod delete chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-pod-delete') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.2.0
env:
INSTALL_LITMUS: true
EXPERIMENT_NAME: pod-delete
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
LITMUS_CLEANUP: true
- name: Update pod delete result
if: startsWith(github.event.comment.body, '/run-e2e-pod-delete') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Pod Delete | Pass | containerd |
- name: Running container kill chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-container-kill') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.2.0
env:
INSTALL_LITMUS: true
EXPERIMENT_NAME: container-kill
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
CONTAINER_RUNTIME: containerd
LITMUS_CLEANUP: true
- name: Update container-kill result
if: startsWith(github.event.comment.body, '/run-e2e-container-kill') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Container Kill | Pass | containerd |
- name: Running node-cpu-hog chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-node-cpu-hog') || startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.2.0
env:
INSTALL_LITMUS: true
EXPERIMENT_NAME: node-cpu-hog
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
LITMUS_CLEANUP: true
- name: Update node-cpu-hog result
if: startsWith(github.event.comment.body, '/run-e2e-node-cpu-hog') || startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Node CPU Hog | Pass | containerd |
- name: Running node-memory-hog chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-node-memory-hog') || startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.2.0
env:
INSTALL_LITMUS: true
EXPERIMENT_NAME: node-memory-hog
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
LITMUS_CLEANUP: true
- name: Update node-memory-hog result
if: startsWith(github.event.comment.body, '/run-e2e-node-memory-hog') || startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Node MEMORY Hog | Pass | containerd |
- name: Running pod-cpu-hog chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-pod-cpu-hog') || startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.2.0
env:
INSTALL_LITMUS: true
EXPERIMENT_NAME: pod-cpu-hog
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
CPU_CORES: 1
LITMUS_CLEANUP: true
- name: Update pod-cpu-hog result
if: startsWith(github.event.comment.body, '/run-e2e-pod-cpu-hog') || startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Pod CPU Hog | Pass | containerd |
- name: Running pod-memory-hog chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-pod-memory-hog') || startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.2.0
env:
INSTALL_LITMUS: true
EXPERIMENT_NAME: pod-cpu-hog
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
MEMORY_CONSUMPTION: 500
LITMUS_CLEANUP: true
- name: Update pod-memory-hog result
if: startsWith(github.event.comment.body, '/run-e2e-pod-memory-hog') || startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Pod Memory Hog | Pass | containerd |
- name: Running pod network corruption chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-pod-network-corruption') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.2.0
env:
INSTALL_LITMUS: true
EXPERIMENT_NAME: pod-network-corruption
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
NETWORK_INTERFACE: eth0
CONTAINER_RUNTIME: containerd
LITMUS_CLEANUP: true
- name: Update pod-network-corruption result
if: startsWith(github.event.comment.body, '/run-e2e-pod-network-corruption') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Pod Network Corruption | Pass | containerd |
- name: Running pod network duplication chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-pod-network-duplication') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.2.0
env:
INSTALL_LITMUS: true
EXPERIMENT_NAME: pod-network-duplication
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
NETWORK_INTERFACE: eth0
CONTAINER_RUNTIME: containerd
LITMUS_CLEANUP: true
- name: Update pod-network-duplication result
if: startsWith(github.event.comment.body, '/run-e2e-pod-network-duplication') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Pod Network Duplication | Pass | containerd |
- name: Running pod-network-latency chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-pod-network-latency') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.2.0
env:
INSTALL_LITMUS: true
EXPERIMENT_NAME: pod-network-latency
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
NETWORK_INTERFACE: eth0
NETWORK_LATENCY: 60000
CONTAINER_RUNTIME: containerd
LITMUS_CLEANUP: true
- name: Update pod-network-latency result
if: startsWith(github.event.comment.body, '/run-e2e-pod-network-latency') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Pod Network Latency | Pass | containerd |
- name: Running pod-network-loss chaos experiment
uses: mayadata-io/github-chaos-actions@v0.2.0
env:
INSTALL_LITMUS: true
EXPERIMENT_NAME: pod-network-loss
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
NETWORK_INTERFACE: eth0
NETWORK_PACKET_LOSS_PERCENTAGE: 100
CONTAINER_RUNTIME: containerd
LITMUS_CLEANUP: true
- name: Update pod-network-loss result
if: startsWith(github.event.comment.body, '/run-e2e-pod-network-loss') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Pod Network Loss | Pass | containerd |
- name: Running pod autoscaler chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-pod-autoscaler') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.2.0
env:
INSTALL_LITMUS: true
EXPERIMENT_NAME: pod-autoscaler
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
TOTAL_CHAOS_DURATION: 60
LITMUS_CLEANUP: true
- name: Update pod-autoscaler result
if: startsWith(github.event.comment.body, '/run-e2e-pod-autoscaler') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Pod Autoscaler | Pass | containerd |
- name: Running node-io-stress chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-node-io-stress') || startsWith(github.event.comment.body, '/run-e2e-io-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.2.0
env:
INSTALL_LITMUS: true
EXPERIMENT_NAME: node-io-stress
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
TOTAL_CHAOS_DURATION: 120
FILESYSTEM_UTILIZATION_PERCENTAGE: 10
LITMUS_CLEANUP: true
- name: Update node-io-stress result
if: startsWith(github.event.comment.body, '/run-e2e-node-io-stress') || startsWith(github.event.comment.body, '/run-e2e-io-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Node IO Stress | Pass | containerd |
- name: Check the test run
if: |
startsWith(github.event.comment.body, '/run-e2e-pod-delete') || startsWith(github.event.comment.body, '/run-e2e-container-kill') ||
startsWith(github.event.comment.body, '/run-e2e-node-cpu-hog') || startsWith(github.event.comment.body, '/run-e2e-node-memory-hog') ||
startsWith(github.event.comment.body, '/run-e2e-pod-cpu-hog') || startsWith(github.event.comment.body, '/run-e2e-pod-memory-hog') ||
startsWith(github.event.comment.body, '/run-e2e-pod-network-corruption') || startsWith(github.event.comment.body, '/run-e2e-pod-network-loss') ||
startsWith(github.event.comment.body, '/run-e2e-pod-network-latency') || startsWith(github.event.comment.body, '/run-e2e-pod-network-duplication') ||
startsWith(github.event.comment.body, '/run-e2e-pod-autoscaler') || startsWith(github.event.comment.body, '/run-e2e-node-io-stress') ||
startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') ||
startsWith(github.event.comment.body, '/run-e2e-io-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
run: |
echo ::set-env name=TEST_RUN::true
- name: Check for all the jobs are succeeded
if: ${{ success() && env.TEST_RUN == 'true' }}
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
****
**Test Result:** All tests are passed
**Run ID:** [${{ env.RUN_ID }}](https://github.com/litmuschaos/litmus-go/actions/runs/${{ env.RUN_ID }})
reactions: hooray
env:
RUN_ID: ${{ github.run_id }}
- name: Check for any job failed
if: ${{ failure() }}
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
****
**Test Failed:** Some tests are failed please check
**Run ID:** [${{ env.RUN_ID }}](https://github.com/litmuschaos/litmus-go/actions/runs/${{ env.RUN_ID }})
reactions: confused
env:
RUN_ID: ${{ github.run_id }}
- name: Deleting KinD cluster
if: ${{ always() }}
run: kind delete cluster
- name: Check if any test ran or not
if: env.TEST_RUN != 'true'
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
****
**Test Result:** No test found
**Run ID:** [${{ env.RUN_ID }}](https://github.com/litmuschaoslitmus-go/actions/runs/${{ env.RUN_ID }})
reactions: eyes
env:
RUN_ID: ${{ github.run_id }}
# This job will merge an equipped PR in two steps:
# Firstly it will add a merge label on the target PR and then it will merge the PR according to the envs provided.
merge:
if: contains(github.event.comment.html_url, '/pull/') && startsWith(github.event.comment.body, '/merge')
runs-on: ubuntu-latest
steps:
- name: Add a merge label
uses: actions-ecosystem/action-add-labels@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
labels: merge
# The action will automatically check if the required number of review approvals has been reached.
- name: automerge
uses: "pascalgn/automerge-action@f81beb99aef41bb55ad072857d43073fba833a98"
env:
GITHUB_TOKEN: "${{ secrets.GITHUB_TOKEN }}"
MERGE_LABELS: "merge,!WIP,!DO NOT MERGE"
MERGE_METHOD: "squash"
MERGE_FORKS: "true"
MERGE_RETRIES: "6"
MERGE_RETRY_SLEEP: "10000"
UPDATE_LABELS: ""
UPDATE_METHOD: "merge"
MERGE_DELETE_BRANCH: true

View File

@ -1,66 +0,0 @@
---
name: Push
on:
push:
branches:
- master
tags-ignore:
- '**'
jobs:
pre-checks:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '1.20'
- uses: actions/checkout@v2
- name: gofmt check
run: |
if [ "$(gofmt -s -l . | wc -l)" -ne 0 ]
then
echo "The following files were found to be not go formatted:"
gofmt -s -l .
exit 1
fi
- name: golangci-lint
uses: reviewdog/action-golangci-lint@v1
push:
needs: pre-checks
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '1.20'
- uses: actions/checkout@v2
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
with:
platforms: all
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v1
with:
version: latest
- name: Login to Docker Hub
uses: docker/login-action@v1
with:
username: ${{ secrets.DNAME }}
password: ${{ secrets.DPASS }}
- name: Build and push
uses: docker/build-push-action@v2
with:
push: true
file: build/Dockerfile
platforms: linux/amd64,linux/arm64
tags: litmuschaos/go-runner:ci
build-args: LITMUS_VERSION=3.10.0

View File

@ -1,65 +0,0 @@
---
name: Release
on:
create:
tags:
- '**'
jobs:
pre-checks:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '1.20'
- uses: actions/checkout@v2
push:
needs: pre-checks
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '1.20'
- uses: actions/checkout@v2
- name: Set Tag
run: |
TAG="${GITHUB_REF#refs/*/}"
echo "TAG=${TAG}" >> $GITHUB_ENV
echo "RELEASE_TAG=${TAG}" >> $GITHUB_ENV
- name: Print Tag info
run: |
echo "RELEASE TAG: ${RELEASE_TAG}"
echo "${RELEASE_TAG}" > ${{ github.workspace }}/tag.txt
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
with:
platforms: all
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v1
with:
version: latest
- name: Login to Docker Hub
uses: docker/login-action@v1
with:
username: ${{ secrets.DNAME }}
password: ${{ secrets.DPASS }}
- name: Build and push
uses: docker/build-push-action@v2
env:
RELEASE_TAG: ${{ env.RELEASE_TAG }}
with:
push: true
file: build/Dockerfile
platforms: linux/amd64,linux/arm64
tags: litmuschaos/go-runner:${{ env.RELEASE_TAG }},litmuschaos/go-runner:latest
build-args: LITMUS_VERSION=3.10.0

View File

@ -1,198 +0,0 @@
name: E2E
on:
pull_request:
branches: [master]
types: [opened, synchronize, reopened]
paths-ignore:
- '**.md'
- '**.yml'
- '**.yaml'
jobs:
Pod_Level_In_Serial_Mode:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v5
with:
go-version: '1.20'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Generating Go binary and Building docker image
run: |
make build-amd64
- name: Install KinD
run: |
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
mv ./kind /usr/local/bin/kind
- name: Create KinD Cluster
run: |
kind create cluster --config build/kind-cluster/kind-config.yaml
- name: Configuring and testing the Installation
run: |
kubectl taint nodes kind-control-plane node-role.kubernetes.io/control-plane-
kubectl cluster-info --context kind-kind
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
- uses: actions/checkout@v2
with:
repository: 'litmuschaos/litmus-e2e'
ref: 'master'
- name: Running Pod level experiment with affected percentage 100 and in series mode
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /home/runner/.kube/config
run: |
make build-litmus
make app-deploy
make pod-affected-perc-ton-series
- name: Deleting KinD cluster
if: always()
run: kind delete cluster
Pod_Level_In_Parallel_Mode:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v5
with:
go-version: '1.20'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Generating Go binary and Building docker image
run: |
make build-amd64
- name: Install KinD
run: |
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
mv ./kind /usr/local/bin/kind
- name: Create KinD Cluster
run: |
kind create cluster --config build/kind-cluster/kind-config.yaml
- name: Configuring and testing the Installation
env:
KUBECONFIG: /home/runner/.kube/config
run: |
kubectl taint nodes kind-control-plane node-role.kubernetes.io/control-plane-
kubectl cluster-info --context kind-kind
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
- uses: actions/checkout@v2
with:
repository: 'litmuschaos/litmus-e2e'
ref: 'master'
- name: Running Pod level experiment with affected percentage 100 and in parallel mode
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /home/runner/.kube/config
run: |
make build-litmus
make app-deploy
make pod-affected-perc-ton-parallel
- name: Deleting KinD cluster
if: always()
run: kind delete cluster
Node_Level_Tests:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v5
with:
go-version: '1.20'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Generating Go binary and Building docker image
run: |
make build-amd64
- name: Install KinD
run: |
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
mv ./kind /usr/local/bin/kind
- name: Create KinD Cluster
run: |
kind create cluster --config build/kind-cluster/kind-config.yaml
- name: Configuring and testing the Installation
run: |
kubectl taint nodes kind-control-plane node-role.kubernetes.io/control-plane-
kubectl cluster-info --context kind-kind
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
- uses: actions/checkout@v2
with:
repository: 'litmuschaos/litmus-e2e'
ref: 'master'
- name: Setup litmus and deploy application
env:
KUBECONFIG: /home/runner/.kube/config
run: |
make build-litmus
make app-deploy
- name: Running Node Drain experiments
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /home/runner/.kube/config
run: make node-drain
- name: Running Node Taint experiments
if: always()
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /home/runner/.kube/config
run: make node-taint
- name: Deleting KinD cluster
if: always()
run: |
kubectl get nodes
kind delete cluster

View File

@ -1,27 +0,0 @@
---
name: Security Scan
on:
workflow_dispatch:
jobs:
trivy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Build an image from Dockerfile
run: |
docker build -f build/Dockerfile -t docker.io/litmuschaos/go-runner:${{ github.sha }} . --build-arg TARGETARCH=amd64 --build-arg LITMUS_VERSION=3.9.0
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: 'docker.io/litmuschaos/go-runner:${{ github.sha }}'
format: 'table'
exit-code: '1'
ignore-unfixed: true
vuln-type: 'os,library'
severity: 'CRITICAL,HIGH'

2
.gitignore vendored
View File

@ -1,4 +1,2 @@
build/_output
.stignore
.idea/
.vscode/

33
.travis.yml Normal file
View File

@ -0,0 +1,33 @@
sudo: required
os: linux
dist: bionic
services:
- docker
language: go
go:
- 1.14.2
addons:
apt:
update: true
before_script:
- sudo apt-get update && sudo apt-get install golint
- export VERSION=$(curl --silent "https://api.github.com/repos/aquasecurity/trivy/releases/latest" | grep '"tag_name":' | sed -E 's/.*"v([^"]+)".*/\1/')
- sudo apt-get install -y rpm
- wget https://github.com/aquasecurity/trivy/releases/download/v${VERSION}/trivy_${VERSION}_Linux-64bit.tar.gz
- tar zxvf trivy_${VERSION}_Linux-64bit.tar.gz
script:
# Installing and configuring dependencies
- make deps
# Includes formatting, linting and check unused packages
- make gotasks
# Build
- make build
# Running trivy check
- make trivy-check
after_success:
- make push

View File

@ -1,62 +0,0 @@
# Contributing to Litmus-Go
Litmus is an Apache 2.0 Licensed project and uses the standard GitHub pull requests process to review and accept contributions.
There are several areas of Litmus that could use your help. For starters, you could help in improving the sections in this document by either creating a new issue describing the improvement or submitting a pull request to this repository.
- If you are a first-time contributor, please see [Steps to Contribute](#steps-to-contribute).
- If you would like to suggest new tests to be added to litmus, please go ahead and [create a new issue](https://github.com/litmuschaos/litmus/issues/new) describing your test. All you need to do is specify the workload type and the operations that you would like to perform on the workload.
- If you would like to work on something more involved, please connect with the Litmus Contributors.
- If you would like to make code contributions, all your commits should be signed with Developer Certificate of Origin. See [Sign your work](#sign-your-work).
## Steps to Contribute
- Find an issue to work on or create a new issue. The issues are maintained at [litmuschaos/litmus](https://github.com/litmuschaos/litmus/issues). You can pick up from a list of [good-first-issues](https://github.com/litmuschaos/litmus/labels/good%20first%20issue).
- Claim your issue by commenting your intent to work on it to avoid duplication of efforts.
- Fork the repository on GitHub.
- Create a branch from where you want to base your work (usually master).
- Make your changes.
- Relevant coding style guidelines are the [Go Code Review Comments](https://code.google.com/p/go-wiki/wiki/CodeReviewComments) and the _Formatting and style_ section of Peter Bourgon's [Go: Best Practices for Production Environments](http://peter.bourgon.org/go-in-production/#formatting-and-style).
- Commit your changes by making sure the commit messages convey the need and notes about the commit.
- Push your changes to the branch in your fork of the repository.
- Submit a pull request to the original repository. See [Pull Request checklist](#pull-request-checklist)
## Pull Request Checklist
- Rebase to the current master branch before submitting your pull request.
- Commits should be as small as possible. Each commit should follow the checklist below:
- For code changes, add tests relevant to the fixed bug or new feature
- Pass the compile and tests - includes spell checks, formatting, etc
- Commit header (first line) should convey what changed
- Commit body should include details such as why the changes are required and how the proposed changes
- DCO Signed
- If your PR is not getting reviewed or you need a specific person to review it, please reach out to the Litmus contributors at the [Litmus slack channel](https://app.slack.com/client/T09NY5SBT/CNXNB0ZTN)
## Sign your work
We use the Developer Certificate of Origin (DCO) as an additional safeguard for the LitmusChaos project. This is a well established and widely used mechanism to assure that contributors have confirmed their right to license their contribution under the project's license. Please add a line to every git commit message:
```sh
Signed-off-by: Random J Developer <random@developer.example.org>
```
Use your real name (sorry, no pseudonyms or anonymous contributions). The email id should match the email id provided in your GitHub profile.
If you set your `user.name` and `user.email` in git config, you can sign your commit automatically with `git commit -s`.
You can also use git [aliases](https://git-scm.com/book/tr/v2/Git-Basics-Git-Aliases) like `git config --global alias.ci 'commit -s'`. Now you can commit with `git ci` and the commit will be signed.
## Setting up your Development Environment
This project is implemented using Go and uses the standard golang tools for development and build. In addition, this project heavily relies on Docker and Kubernetes. It is expected that the contributors.
- are familiar with working with Go
- are familiar with Docker containers
- are familiar with Kubernetes and have access to a Kubernetes cluster or Minikube to test the changes.
For the creation of new chaos-experiment and testing of the modified changes, see the detailed instructions [here](./contribute/developer-guide/README.md).
## Community
The litmus community will have a monthly community sync-up on 3rd Wednesday 22.00-23.00IST / 18.30-19.30CEST
- The community meeting details are available [here](https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q). Please feel free to join the community meeting.

30
Dockerfile Normal file
View File

@ -0,0 +1,30 @@
FROM golang:1.13.4 as builder
WORKDIR /tmp/litmus/
# Copying the experiments and chaos libraries
COPY . .
# After copying the files, we need to ensure the files beling to the user,
# otherwise we will not be able to write build files
USER root
RUN chown -R 500 /tmp/litmus
USER 500
WORKDIR /tmp/litmus/experiments/generic/
# We need to ensure a reasonable build cache dir, as user 500 does not exist on certain systems,
# and will not have permission to write to /.cache, as the user does not exist
ENV XDG_CACHE_HOME=/tmp/.cache
# Building the executables and placing them in a separate directory
RUN go build -o /tmp/litmus/build/ -mod vendor ./...
# Using as main image the crictl image with copying only the binaries
FROM litmuschaos/crictl:latest
WORKDIR /tmp/litmus/
COPY --from=builder /tmp/litmus/build .

View File

@ -7,31 +7,23 @@
#
IS_DOCKER_INSTALLED = $(shell which docker >> /dev/null 2>&1; echo $$?)
# Docker info
DOCKER_REGISTRY ?= docker.io
DOCKER_REPO ?= litmuschaos
DOCKER_IMAGE ?= go-runner
DOCKER_TAG ?= ci
PACKAGES = $(shell go list ./... | grep -v '/vendor/')
.PHONY: all
all: deps gotasks build push trivy-check
.PHONY: help
help:
@echo ""
@echo "Usage:-"
@echo "\tmake deps -- sets up dependencies for image build"
@echo "\tmake push -- pushes the litmus-go multi-arch image"
@echo "\tmake build-amd64 -- builds the litmus-go binary & docker amd64 image"
@echo "\tmake push-amd64 -- pushes the litmus-go amd64 image"
@echo "\tmake all -- [default] builds the litmus containers"
@echo ""
.PHONY: all
all: deps gotasks build push trivy-check
.PHONY: deps
deps: _build_check_docker
_build_check_docker:
@echo "------------------"
@echo "--> Check the Docker deps"
@echo "--> Check the Docker deps"
@echo "------------------"
@if [ $(IS_DOCKER_INSTALLED) -eq 1 ]; \
then echo "" \
@ -41,54 +33,57 @@ _build_check_docker:
fi;
.PHONY: gotasks
gotasks: unused-package-check
gotasks: format lint unused-package-check
.PHONY: format
format:
@echo "------------------"
@echo "--> Running go fmt"
@echo "------------------"
@go fmt $(PACKAGES)
.PHONY: lint
lint:
@echo "------------------"
@echo "--> Running golint"
@echo "------------------"
@go get -u golang.org/x/lint/golint
@golint $(PACKAGES)
@echo "------------------"
@echo "--> Running go vet"
@echo "------------------"
@go vet $(PACKAGES)
.PHONY: unused-package-check
unused-package-check:
@echo "------------------"
@echo "--> Check unused packages for the litmus-go"
@echo "--> Check unused packages for the chaos-operator"
@echo "------------------"
@tidy=$$(go mod tidy); \
if [ -n "$${tidy}" ]; then \
echo "go mod tidy checking failed!"; echo "$${tidy}"; echo; \
fi
.PHONY: docker.buildx
docker.buildx:
.PHONY: build
build:
@echo "------------------------------"
@echo "--> Setting up Builder "
@echo "--> Build experiment go binary"
@echo "------------------------------"
@if ! docker buildx ls | grep -q multibuilder; then\
docker buildx create --name multibuilder;\
docker buildx inspect multibuilder --bootstrap;\
docker buildx use multibuilder;\
fi
@sh build/generate_go_binary
@echo "------------------"
@echo "--> Build go-runner image"
@echo "------------------"
sudo docker build . -f build/litmus-go/Dockerfile -t litmuschaos/go-runner:ci
.PHONY: push
push: docker.buildx image-push
push:
image-push:
@echo "------------------------"
@echo "--> Push go-runner image"
@echo "------------------------"
@echo "Pushing $(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)"
@docker buildx build . --push --file build/Dockerfile --progress plain --platform linux/arm64,linux/amd64 --no-cache --tag $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
.PHONY: build-amd64
build-amd64:
@echo "-------------------------"
@echo "--> Build go-runner image"
@echo "-------------------------"
@sudo docker build --file build/Dockerfile --tag $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG) . --build-arg TARGETARCH=amd64 --build-arg LITMUS_VERSION=3.9.0
.PHONY: push-amd64
push-amd64:
@echo "------------------------------"
@echo "--> Pushing image"
@echo "------------------------------"
@sudo docker push $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
@echo "------------------"
@echo "--> go-runner image"
@echo "------------------"
REPONAME="litmuschaos" IMGNAME="go-runner" IMGTAG="ci" ./build/push
.PHONY: trivy-check
trivy-check:
@ -96,5 +91,5 @@ trivy-check:
@echo "------------------------"
@echo "---> Running Trivy Check"
@echo "------------------------"
@./trivy --exit-code 0 --severity HIGH --no-progress $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
@./trivy --exit-code 0 --severity CRITICAL --no-progress $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
@./trivy --exit-code 0 --severity HIGH --no-progress litmuschaos/go-runner:ci
@./trivy --exit-code 0 --severity CRITICAL --no-progress litmuschaos/go-runner:ci

View File

@ -1,40 +1,19 @@
# LitmusGo:
# LitmusGo:
[![Slack Channel](https://img.shields.io/badge/Slack-Join-purple)](https://slack.litmuschaos.io)
![GitHub Workflow](https://github.com/litmuschaos/litmus-go/actions/workflows/push.yml/badge.svg?branch=master)
[![Docker Pulls](https://img.shields.io/docker/pulls/litmuschaos/go-runner.svg)](https://hub.docker.com/r/litmuschaos/go-runner)
[![GitHub issues](https://img.shields.io/github/issues/litmuschaos/litmus-go)](https://github.com/litmuschaos/litmus-go/issues)
[![Twitter Follow](https://img.shields.io/twitter/follow/litmuschaos?style=social)](https://twitter.com/LitmusChaos)
[![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/5297/badge)](https://bestpractices.coreinfrastructure.org/projects/5297)
[![Go Report Card](https://goreportcard.com/badge/github.com/litmuschaos/litmus-go)](https://goreportcard.com/report/github.com/litmuschaos/litmus-go)
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Flitmuschaos%2Flitmus-go.svg?type=shield)](https://app.fossa.io/projects/git%2Bgithub.com%2Flitmuschaos%2Flitmus-go?ref=badge_shield)
[![YouTube Channel](https://img.shields.io/badge/YouTube-Subscribe-red)](https://www.youtube.com/channel/UCa57PMqmz_j0wnteRa9nCaw)
<br><br>
- This repo consists of Litmus Chaos Experiments written in golang. The examples in this repo are good indicators
of how to construct the experiments in golang: complete with steady state checks, chaosresult generation, chaos injection etc..,
post chaos checks, create events and reports for observability and configure sinks for these.
This repo consists of Litmus Chaos Experiments written in golang. The examples in this repo are good indicators of how to construct the experiments in golang: complete with steady state checks, chaosresult generation, chaos injection etc.., post chaos checks, create events and reports for observability and configure sinks for these.
## Run E2E on a Pull Request
**NOTE**: This repo can be viewed as an extension to the [litmuschaos/litmus](https://github.com/litmuschaos/litmus) repo. The litmus repo will also continue to be the project's community-facing meta repo housing other important project artifacts. In that sense, litmus-go is very similar to and therefore a sister repo of [litmus-python](https://github.com/litmuschaos/litmus-python) which houses examples for experiment business logic written in python.
- We can run a certain number of custom tests on a PR using GitHub chaos actions read about [custom bot](https://github.com/litmuschaos/litmus-go/blob/master/.github/workflows/guide.md) to know more.
## Litmus SDK
**NOTE**
The Litmus SDK provides a simple way to bootstrap your experiment and helps create the aforementioned artifacts in the appropriate directory (i.e., as per the chaos-category) based on an attributes file provided as input by the chart-developer. The scaffolded files consist of placeholders which can then be filled as desired.
- This repo can be viewed as an extension to the [litmuschaos/litmus](https://github.com/litmuschaos/litmus) repo
in the sense that the litmus repo also houses a significant set of experiments, built using ansible. The litmus repo
will also continue to be the project's community-facing meta repo housing other important project arefacts. In that
sense, litmus-go is very similar to and therefore a sister repo of [litmus-python](https://github.com/litmuschaos/litmus-python) which
houses examples for experiment business logic written in python.
It generates the custom chaos experiments with some default Pre & Post Chaos Checks (AUT & Auxiliary Applications status checks). It can use the existing chaoslib (present inside /chaoslib directory), if available else It will create a new chaoslib inside the corresponding directory.
Refer [Litmus-SDK](https://github.com/litmuschaos/litmus-go/blob/master/contribute/developer-guide/README.md) for more details.
## How to get started?
Refer the [LitmusChaos Docs](https://docs.litmuschaos.io) and [Experiment Docs](https://litmuschaos.github.io/litmus/experiments/categories/contents/)
## How do I contribute?
You can contribute by raising issues, improving the documentation, contributing to the core framework and tooling, etc.
Head over to the [Contribution guide](CONTRIBUTING.md)
## License
Here is a copy of the License: [`License`](LICENSE)
## License Status and Vulnerability Check
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Flitmuschaos%2Flitmus-go.svg?type=large)](https://app.fossa.io/projects/git%2Bgithub.com%2Flitmuschaos%2Flitmus-go?ref=badge_large)

View File

@ -1,223 +0,0 @@
package main
import (
"context"
"errors"
"flag"
"os"
// Uncomment to load all auth plugins
// _ "k8s.io/client-go/plugin/pkg/client/auth"
// Or uncomment to load specific auth plugins
// _ "k8s.io/client-go/plugin/pkg/client/auth/azure"
// _ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
// _ "k8s.io/client-go/plugin/pkg/client/auth/oidc"
// _ "k8s.io/client-go/plugin/pkg/client/auth/openstack"
"go.opentelemetry.io/otel"
awsSSMChaosByID "github.com/litmuschaos/litmus-go/experiments/aws-ssm/aws-ssm-chaos-by-id/experiment"
awsSSMChaosByTag "github.com/litmuschaos/litmus-go/experiments/aws-ssm/aws-ssm-chaos-by-tag/experiment"
azureDiskLoss "github.com/litmuschaos/litmus-go/experiments/azure/azure-disk-loss/experiment"
azureInstanceStop "github.com/litmuschaos/litmus-go/experiments/azure/instance-stop/experiment"
redfishNodeRestart "github.com/litmuschaos/litmus-go/experiments/baremetal/redfish-node-restart/experiment"
cassandraPodDelete "github.com/litmuschaos/litmus-go/experiments/cassandra/pod-delete/experiment"
gcpVMDiskLossByLabel "github.com/litmuschaos/litmus-go/experiments/gcp/gcp-vm-disk-loss-by-label/experiment"
gcpVMDiskLoss "github.com/litmuschaos/litmus-go/experiments/gcp/gcp-vm-disk-loss/experiment"
gcpVMInstanceStopByLabel "github.com/litmuschaos/litmus-go/experiments/gcp/gcp-vm-instance-stop-by-label/experiment"
gcpVMInstanceStop "github.com/litmuschaos/litmus-go/experiments/gcp/gcp-vm-instance-stop/experiment"
containerKill "github.com/litmuschaos/litmus-go/experiments/generic/container-kill/experiment"
diskFill "github.com/litmuschaos/litmus-go/experiments/generic/disk-fill/experiment"
dockerServiceKill "github.com/litmuschaos/litmus-go/experiments/generic/docker-service-kill/experiment"
kubeletServiceKill "github.com/litmuschaos/litmus-go/experiments/generic/kubelet-service-kill/experiment"
nodeCPUHog "github.com/litmuschaos/litmus-go/experiments/generic/node-cpu-hog/experiment"
nodeDrain "github.com/litmuschaos/litmus-go/experiments/generic/node-drain/experiment"
nodeIOStress "github.com/litmuschaos/litmus-go/experiments/generic/node-io-stress/experiment"
nodeMemoryHog "github.com/litmuschaos/litmus-go/experiments/generic/node-memory-hog/experiment"
nodeRestart "github.com/litmuschaos/litmus-go/experiments/generic/node-restart/experiment"
nodeTaint "github.com/litmuschaos/litmus-go/experiments/generic/node-taint/experiment"
podAutoscaler "github.com/litmuschaos/litmus-go/experiments/generic/pod-autoscaler/experiment"
podCPUHogExec "github.com/litmuschaos/litmus-go/experiments/generic/pod-cpu-hog-exec/experiment"
podCPUHog "github.com/litmuschaos/litmus-go/experiments/generic/pod-cpu-hog/experiment"
podDelete "github.com/litmuschaos/litmus-go/experiments/generic/pod-delete/experiment"
podDNSError "github.com/litmuschaos/litmus-go/experiments/generic/pod-dns-error/experiment"
podDNSSpoof "github.com/litmuschaos/litmus-go/experiments/generic/pod-dns-spoof/experiment"
podFioStress "github.com/litmuschaos/litmus-go/experiments/generic/pod-fio-stress/experiment"
podHttpLatency "github.com/litmuschaos/litmus-go/experiments/generic/pod-http-latency/experiment"
podHttpModifyBody "github.com/litmuschaos/litmus-go/experiments/generic/pod-http-modify-body/experiment"
podHttpModifyHeader "github.com/litmuschaos/litmus-go/experiments/generic/pod-http-modify-header/experiment"
podHttpResetPeer "github.com/litmuschaos/litmus-go/experiments/generic/pod-http-reset-peer/experiment"
podHttpStatusCode "github.com/litmuschaos/litmus-go/experiments/generic/pod-http-status-code/experiment"
podIOStress "github.com/litmuschaos/litmus-go/experiments/generic/pod-io-stress/experiment"
podMemoryHogExec "github.com/litmuschaos/litmus-go/experiments/generic/pod-memory-hog-exec/experiment"
podMemoryHog "github.com/litmuschaos/litmus-go/experiments/generic/pod-memory-hog/experiment"
podNetworkCorruption "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-corruption/experiment"
podNetworkDuplication "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-duplication/experiment"
podNetworkLatency "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-latency/experiment"
podNetworkLoss "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-loss/experiment"
podNetworkPartition "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-partition/experiment"
podNetworkRateLimit "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-rate-limit/experiment"
kafkaBrokerPodFailure "github.com/litmuschaos/litmus-go/experiments/kafka/kafka-broker-pod-failure/experiment"
ebsLossByID "github.com/litmuschaos/litmus-go/experiments/kube-aws/ebs-loss-by-id/experiment"
ebsLossByTag "github.com/litmuschaos/litmus-go/experiments/kube-aws/ebs-loss-by-tag/experiment"
ec2TerminateByID "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate-by-id/experiment"
ec2TerminateByTag "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate-by-tag/experiment"
rdsInstanceStop "github.com/litmuschaos/litmus-go/experiments/kube-aws/rds-instance-stop/experiment"
k6Loadgen "github.com/litmuschaos/litmus-go/experiments/load/k6-loadgen/experiment"
springBootFaults "github.com/litmuschaos/litmus-go/experiments/spring-boot/spring-boot-faults/experiment"
vmpoweroff "github.com/litmuschaos/litmus-go/experiments/vmware/vm-poweroff/experiment"
cli "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/sirupsen/logrus"
)
func init() {
// Log as JSON instead of the default ASCII formatter.
logrus.SetFormatter(&logrus.TextFormatter{
FullTimestamp: true,
DisableSorting: true,
DisableLevelTruncation: true,
})
}
func main() {
initCtx := context.Background()
// Set up Observability.
if otelExporterEndpoint := os.Getenv(telemetry.OTELExporterOTLPEndpoint); otelExporterEndpoint != "" {
shutdown, err := telemetry.InitOTelSDK(initCtx, true, otelExporterEndpoint)
if err != nil {
log.Errorf("Failed to initialize OTel SDK: %v", err)
return
}
defer func() {
err = errors.Join(err, shutdown(initCtx))
}()
initCtx = telemetry.GetTraceParentContext()
}
clients := cli.ClientSets{}
ctx, span := otel.Tracer(telemetry.TracerName).Start(initCtx, "ExecuteExperiment")
defer span.End()
// parse the experiment name
experimentName := flag.String("name", "pod-delete", "name of the chaos experiment")
//Getting kubeConfig and Generate ClientSets
if err := clients.GenerateClientSetFromKubeConfig(); err != nil {
log.Errorf("Unable to Get the kubeconfig, err: %v", err)
return
}
log.Infof("Experiment Name: %v", *experimentName)
// invoke the corresponding experiment based on the (-name) flag
switch *experimentName {
case "container-kill":
containerKill.ContainerKill(ctx, clients)
case "disk-fill":
diskFill.DiskFill(ctx, clients)
case "kafka-broker-pod-failure":
kafkaBrokerPodFailure.KafkaBrokerPodFailure(ctx, clients)
case "kubelet-service-kill":
kubeletServiceKill.KubeletServiceKill(ctx, clients)
case "docker-service-kill":
dockerServiceKill.DockerServiceKill(ctx, clients)
case "node-cpu-hog":
nodeCPUHog.NodeCPUHog(ctx, clients)
case "node-drain":
nodeDrain.NodeDrain(ctx, clients)
case "node-io-stress":
nodeIOStress.NodeIOStress(ctx, clients)
case "node-memory-hog":
nodeMemoryHog.NodeMemoryHog(ctx, clients)
case "node-taint":
nodeTaint.NodeTaint(ctx, clients)
case "pod-autoscaler":
podAutoscaler.PodAutoscaler(ctx, clients)
case "pod-cpu-hog-exec":
podCPUHogExec.PodCPUHogExec(ctx, clients)
case "pod-delete":
podDelete.PodDelete(ctx, clients)
case "pod-io-stress":
podIOStress.PodIOStress(ctx, clients)
case "pod-memory-hog-exec":
podMemoryHogExec.PodMemoryHogExec(ctx, clients)
case "pod-network-corruption":
podNetworkCorruption.PodNetworkCorruption(ctx, clients)
case "pod-network-duplication":
podNetworkDuplication.PodNetworkDuplication(ctx, clients)
case "pod-network-latency":
podNetworkLatency.PodNetworkLatency(ctx, clients)
case "pod-network-loss":
podNetworkLoss.PodNetworkLoss(ctx, clients)
case "pod-network-partition":
podNetworkPartition.PodNetworkPartition(ctx, clients)
case "pod-network-rate-limit":
podNetworkRateLimit.PodNetworkRateLimit(ctx, clients)
case "pod-memory-hog":
podMemoryHog.PodMemoryHog(ctx, clients)
case "pod-cpu-hog":
podCPUHog.PodCPUHog(ctx, clients)
case "cassandra-pod-delete":
cassandraPodDelete.CasssandraPodDelete(ctx, clients)
case "aws-ssm-chaos-by-id":
awsSSMChaosByID.AWSSSMChaosByID(ctx, clients)
case "aws-ssm-chaos-by-tag":
awsSSMChaosByTag.AWSSSMChaosByTag(ctx, clients)
case "ec2-terminate-by-id":
ec2TerminateByID.EC2TerminateByID(ctx, clients)
case "ec2-terminate-by-tag":
ec2TerminateByTag.EC2TerminateByTag(ctx, clients)
case "ebs-loss-by-id":
ebsLossByID.EBSLossByID(ctx, clients)
case "ebs-loss-by-tag":
ebsLossByTag.EBSLossByTag(ctx, clients)
case "rds-instance-stop":
rdsInstanceStop.RDSInstanceStop(ctx, clients)
case "node-restart":
nodeRestart.NodeRestart(ctx, clients)
case "pod-dns-error":
podDNSError.PodDNSError(ctx, clients)
case "pod-dns-spoof":
podDNSSpoof.PodDNSSpoof(ctx, clients)
case "pod-http-latency":
podHttpLatency.PodHttpLatency(ctx, clients)
case "pod-http-status-code":
podHttpStatusCode.PodHttpStatusCode(ctx, clients)
case "pod-http-modify-header":
podHttpModifyHeader.PodHttpModifyHeader(ctx, clients)
case "pod-http-modify-body":
podHttpModifyBody.PodHttpModifyBody(ctx, clients)
case "pod-http-reset-peer":
podHttpResetPeer.PodHttpResetPeer(ctx, clients)
case "vm-poweroff":
vmpoweroff.VMPoweroff(ctx, clients)
case "azure-instance-stop":
azureInstanceStop.AzureInstanceStop(ctx, clients)
case "azure-disk-loss":
azureDiskLoss.AzureDiskLoss(ctx, clients)
case "gcp-vm-disk-loss":
gcpVMDiskLoss.VMDiskLoss(ctx, clients)
case "pod-fio-stress":
podFioStress.PodFioStress(ctx, clients)
case "gcp-vm-instance-stop":
gcpVMInstanceStop.VMInstanceStop(ctx, clients)
case "redfish-node-restart":
redfishNodeRestart.NodeRestart(ctx, clients)
case "gcp-vm-instance-stop-by-label":
gcpVMInstanceStopByLabel.GCPVMInstanceStopByLabel(ctx, clients)
case "gcp-vm-disk-loss-by-label":
gcpVMDiskLossByLabel.GCPVMDiskLossByLabel(ctx, clients)
case "spring-boot-cpu-stress", "spring-boot-memory-stress", "spring-boot-exceptions", "spring-boot-app-kill", "spring-boot-faults", "spring-boot-latency":
springBootFaults.Experiment(ctx, clients, *experimentName)
case "k6-loadgen":
k6Loadgen.Experiment(ctx, clients)
default:
log.Errorf("Unsupported -name %v, please provide the correct value of -name args", *experimentName)
return
}
}

View File

@ -1,90 +0,0 @@
package main
import (
"context"
"errors"
"flag"
"os"
// Uncomment to load all auth plugins
// _ "k8s.io/client-go/plugin/pkg/client/auth"
// Or uncomment to load specific auth plugins
// _ "k8s.io/client-go/plugin/pkg/client/auth/azure"
// _ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
// _ "k8s.io/client-go/plugin/pkg/client/auth/oidc"
// _ "k8s.io/client-go/plugin/pkg/client/auth/openstack"
containerKill "github.com/litmuschaos/litmus-go/chaoslib/litmus/container-kill/helper"
diskFill "github.com/litmuschaos/litmus-go/chaoslib/litmus/disk-fill/helper"
httpChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/helper"
networkChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/helper"
dnsChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/pod-dns-chaos/helper"
stressChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/stress-chaos/helper"
cli "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
func init() {
// Log as JSON instead of the default ASCII formatter
logrus.SetFormatter(&logrus.TextFormatter{
FullTimestamp: true,
DisableSorting: true,
DisableLevelTruncation: true,
})
}
func main() {
ctx := context.Background()
// Set up Observability.
if otelExporterEndpoint := os.Getenv(telemetry.OTELExporterOTLPEndpoint); otelExporterEndpoint != "" {
shutdown, err := telemetry.InitOTelSDK(ctx, true, otelExporterEndpoint)
if err != nil {
log.Errorf("Failed to initialize OTel SDK: %v", err)
return
}
defer func() {
err = errors.Join(err, shutdown(ctx))
}()
ctx = telemetry.GetTraceParentContext()
}
clients := cli.ClientSets{}
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "ExecuteExperimentHelper")
defer span.End()
// parse the helper name
helperName := flag.String("name", "", "name of the helper pod")
//Getting kubeConfig and Generate ClientSets
if err := clients.GenerateClientSetFromKubeConfig(); err != nil {
log.Errorf("Unable to Get the kubeconfig, err: %v", err)
return
}
log.Infof("Helper Name: %v", *helperName)
// invoke the corresponding helper based on the the (-name) flag
switch *helperName {
case "container-kill":
containerKill.Helper(ctx, clients)
case "disk-fill":
diskFill.Helper(ctx, clients)
case "dns-chaos":
dnsChaos.Helper(ctx, clients)
case "stress-chaos":
stressChaos.Helper(ctx, clients)
case "network-chaos":
networkChaos.Helper(ctx, clients)
case "http-chaos":
httpChaos.Helper(ctx, clients)
default:
log.Errorf("Unsupported -name %v, please provide the correct value of -name args", *helperName)
return
}
}

View File

@ -1,112 +0,0 @@
# Multi-stage docker build
# Build stage
FROM golang:1.22 AS builder
ARG TARGETOS=linux
ARG TARGETARCH
ADD . /litmus-go
WORKDIR /litmus-go
RUN export GOOS=${TARGETOS} && \
export GOARCH=${TARGETARCH}
RUN CGO_ENABLED=0 go build -o /output/experiments ./bin/experiment
RUN CGO_ENABLED=0 go build -o /output/helpers ./bin/helper
# Packaging stage
FROM registry.access.redhat.com/ubi9/ubi:9.4
LABEL maintainer="LitmusChaos"
ARG TARGETARCH
ARG LITMUS_VERSION
# Install generally useful things
RUN yum install -y \
sudo \
sshpass \
procps \
openssh-clients
# tc binary
RUN yum install -y https://dl.rockylinux.org/vault/rocky/9.3/devel/$(uname -m)/os/Packages/i/iproute-6.2.0-5.el9.$(uname -m).rpm
RUN yum install -y https://dl.rockylinux.org/vault/rocky/9.3/devel/$(uname -m)/os/Packages/i/iproute-tc-6.2.0-5.el9.$(uname -m).rpm
# iptables
RUN yum install -y https://dl.rockylinux.org/vault/rocky/9.3/devel/$(uname -m)/os/Packages/i/iptables-libs-1.8.8-6.el9_1.$(uname -m).rpm
RUN yum install -y https://dl.fedoraproject.org/pub/archive/epel/9.3/Everything/$(uname -m)/Packages/i/iptables-legacy-libs-1.8.8-6.el9.2.$(uname -m).rpm
RUN yum install -y https://dl.fedoraproject.org/pub/archive/epel/9.3/Everything/$(uname -m)/Packages/i/iptables-legacy-1.8.8-6.el9.2.$(uname -m).rpm
# stress-ng
RUN yum install -y https://yum.oracle.com/repo/OracleLinux/OL9/appstream/$(uname -m)/getPackage/Judy-1.0.5-28.el9.$(uname -m).rpm
RUN yum install -y https://yum.oracle.com/repo/OracleLinux/OL9/appstream/$(uname -m)/getPackage/stress-ng-0.14.00-2.el9.$(uname -m).rpm
#Installing Kubectl
ENV KUBE_LATEST_VERSION="v1.31.0"
RUN curl -L https://storage.googleapis.com/kubernetes-release/release/${KUBE_LATEST_VERSION}/bin/linux/${TARGETARCH}/kubectl -o /usr/bin/kubectl && \
chmod 755 /usr/bin/kubectl
#Installing crictl binaries
RUN curl -L https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.31.1/crictl-v1.31.1-linux-${TARGETARCH}.tar.gz --output crictl-v1.31.1-linux-${TARGETARCH}.tar.gz && \
tar zxvf crictl-v1.31.1-linux-${TARGETARCH}.tar.gz -C /sbin && \
chmod 755 /sbin/crictl
#Installing promql cli binaries
RUN curl -L https://github.com/chaosnative/promql-cli/releases/download/3.0.0-beta6/promql_linux_${TARGETARCH} --output /usr/bin/promql && chmod 755 /usr/bin/promql
#Installing pause cli binaries
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/pause-linux-${TARGETARCH} --output /usr/bin/pause && chmod 755 /usr/bin/pause
#Installing dns_interceptor cli binaries
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/dns_interceptor --output /sbin/dns_interceptor && chmod 755 /sbin/dns_interceptor
#Installing nsutil cli binaries
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/nsutil-linux-${TARGETARCH} --output /sbin/nsutil && chmod 755 /sbin/nsutil
#Installing nsutil shared lib
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/nsutil_${TARGETARCH}.so --output /usr/local/lib/nsutil.so && chmod 755 /usr/local/lib/nsutil.so
# Installing toxiproxy binaries
RUN curl -L https://litmus-http-proxy.s3.amazonaws.com/cli/cli/toxiproxy-cli-linux-${TARGETARCH}.tar.gz --output toxiproxy-cli-linux-${TARGETARCH}.tar.gz && \
tar zxvf toxiproxy-cli-linux-${TARGETARCH}.tar.gz -C /sbin/ && \
chmod 755 /sbin/toxiproxy-cli
RUN curl -L https://litmus-http-proxy.s3.amazonaws.com/server/server/toxiproxy-server-linux-${TARGETARCH}.tar.gz --output toxiproxy-server-linux-${TARGETARCH}.tar.gz && \
tar zxvf toxiproxy-server-linux-${TARGETARCH}.tar.gz -C /sbin/ && \
chmod 755 /sbin/toxiproxy-server
ENV APP_USER=litmus
ENV APP_DIR="/$APP_USER"
ENV DATA_DIR="$APP_DIR/data"
# The USERD_ID of user
ENV APP_USER_ID=2000
RUN useradd -s /bin/true -u $APP_USER_ID -m -d $APP_DIR $APP_USER
# change to 0(root) group because openshift will run container with arbitrary uid as a member of root group
RUN chgrp -R 0 "$APP_DIR" && chmod -R g=u "$APP_DIR"
# Giving sudo to all users (required for almost all experiments)
RUN echo 'ALL ALL=(ALL:ALL) NOPASSWD: ALL' >> /etc/sudoers
WORKDIR $APP_DIR
COPY --from=builder /output/ .
COPY --from=docker:27.0.3 /usr/local/bin/docker /sbin/docker
RUN chmod 755 /sbin/docker
# Set permissions and ownership for the copied binaries
RUN chmod 755 ./experiments ./helpers && \
chown ${APP_USER}:0 ./experiments ./helpers
# Set ownership for binaries in /sbin and /usr/bin
RUN chown ${APP_USER}:0 /sbin/* /usr/bin/* && \
chown root:root /usr/bin/sudo && \
chmod 4755 /usr/bin/sudo
# Copying Necessary Files
COPY ./pkg/cloud/aws/common/ssm-docs/LitmusChaos-AWS-SSM-Docs.yml ./LitmusChaos-AWS-SSM-Docs.yml
RUN chown ${APP_USER}:0 ./LitmusChaos-AWS-SSM-Docs.yml && chmod 755 ./LitmusChaos-AWS-SSM-Docs.yml
USER ${APP_USER}

40
build/generate_go_binary Normal file
View File

@ -0,0 +1,40 @@
# Building go binaries for pod_delete experiment
go build -o build/_output/pod-delete ./experiments/generic/pod-delete
# Building go binaries for pod_cpu_hog experiment
go build -o build/_output/pod-cpu-hog ./experiments/generic/pod-cpu-hog
# Building go binaries for pod_memory_hog experiment
go build -o build/_output/pod-memory-hog ./experiments/generic/pod-memory-hog
# Buiding go binaries for pod_network_duplication experiment
go build -o build/_output/pod-network-duplication ./experiments/generic/pod-network-duplication
# Buiding go binaries for pod_network_latency experiment
go build -o build/_output/pod-network-latency ./experiments/generic/pod-network-latency
# Buiding go binaries for pod_network_loss experiment
go build -o build/_output/pod-network-loss ./experiments/generic/pod-network-loss
# Buiding go binaries for pod_network_corruption experiment
go build -o build/_output/pod-network-corruption ./experiments/generic/pod-network-corruption
# Buiding go binaries for node_taint experiment
go build -o build/_output/node-taint ./experiments/generic/node-taint
# Buiding go binaries for node_drain experiment
go build -o build/_output/node-drain ./experiments/generic/node-drain
# Buiding go binaries for kubelet_service_kill experiment
go build -o build/_output/kubelet-service-kill ./experiments/generic/kubelet-service-kill
# Buiding go binaries for node_memory_hog experiment
go build -o build/_output/node-memory-hog ./experiments/generic/node-memory-hog
# Buiding go binaries for node_cpu_hog experiment
go build -o build/_output/node-cpu-hog ./experiments/generic/node-cpu-hog
# Buiding go binaries for container_kill experiment
go build -o build/_output/container-kill ./experiments/generic/container-kill
# Buiding go binaries for disk_fill experiment
go build -o build/_output/disk-fill ./experiments/generic/disk-fill
# Buiding go binaries for pod-autoscaler experiment
go build -o build/_output/pod-autoscaler ./experiments/generic/pod-autoscaler
# Buiding go binaries for container_kill helper
go build -o build/_output/container-killer ./chaoslib/litmus/container-kill/helper
# Buiding go binaries for cassandra-pod-delete
go build -o build/_output/cassandra-pod-delete ./experiments/cassandra/pod-delete
# Buiding go binaries for network_chaos helper
go build -o build/_output/network-chaos ./chaoslib/litmus/network-chaos/helper
# Build go binaries for pod-io-stress chaos
go build -o build/_output/pod-io-stress ./experiments/generic/pod-io-stress/
# Building go binaries for node-io-stress experiment
go build -o build/_output/node-io-stress ./experiments/generic/node-io-stress/

View File

@ -1,6 +0,0 @@
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker

View File

@ -0,0 +1,24 @@
FROM ubuntu:bionic
LABEL maintainer="LitmusChaos"
#Installing necessary ubuntu packages
RUN apt-get update && apt-get install -y curl bash systemd iproute2 stress-ng
#Installing Kubectl
ENV KUBE_LATEST_VERSION="v1.19.0"
RUN curl -L https://storage.googleapis.com/kubernetes-release/release/${KUBE_LATEST_VERSION}/bin/linux/amd64/kubectl -o /usr/local/bin/kubectl && \
chmod +x /usr/local/bin/kubectl
#Installing crictl binaries
RUN curl -L https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.16.0/crictl-v1.16.0-linux-amd64.tar.gz --output crictl-v1.16.0-linux-amd64.tar.gz && \
tar zxvf crictl-v1.16.0-linux-amd64.tar.gz -C /usr/local/bin
#Installing pumba binaries
ENV PUMBA_VERSION="0.6.5"
RUN curl -L https://github.com/alexei-led/pumba/releases/download/${PUMBA_VERSION}/pumba_linux_amd64 --output /usr/local/bin/pumba && chmod +x /usr/local/bin/pumba
#Copying Necessary Files
COPY ./build/_output ./litmus/experiments
WORKDIR /litmus

37
build/push Executable file
View File

@ -0,0 +1,37 @@
#!/bin/bash
set -e
if [ -z "${REPONAME}" ]
then
REPONAME="litmuschaos"
fi
if [ -z "${IMGNAME}" ] || [ -z "${IMGTAG}" ];
then
echo "Image details are missing. Nothing to push.";
exit 1
fi
IMAGEID=$( sudo docker images -q ${REPONAME}/${IMGNAME}:${IMGTAG} )
if [ ! -z "${DNAME}" ] && [ ! -z "${DPASS}" ];
then
sudo docker login -u "${DNAME}" -p "${DPASS}";
# Push image to docker hub
echo "Pushing ${REPONAME}/${IMGNAME}:${IMGTAG} ...";
sudo docker push ${REPONAME}/${IMGNAME}:${IMGTAG} ;
if [ ! -z "${TRAVIS_TAG}" ] ;
then
# Push with different tags if tagged as a release
# When github is tagged with a release, then Travis will
# set the release tag in env TRAVIS_TAG
echo "Pushing ${REPONAME}/${IMGNAME}:${TRAVIS_TAG} ...";
sudo docker tag ${IMAGEID} ${REPONAME}/${IMGNAME}:${TRAVIS_TAG}
sudo docker push ${REPONAME}/${IMGNAME}:${TRAVIS_TAG};
echo "Pushing ${REPONAME}/${IMGNAME}:latest ...";
sudo docker tag ${IMAGEID} ${REPONAME}/${IMGNAME}:latest
sudo docker push ${REPONAME}/${IMGNAME}:latest;
fi;
else
echo "No docker credentials provided. Skip uploading ${REPONAME}/${IMGNAME}:${IMGTAG} to docker hub";
fi;

View File

@ -1,180 +0,0 @@
package lib
import (
"context"
"os"
"strings"
"time"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
// InjectChaosInSerialMode will inject the aws ssm chaos in serial mode that is one after other
func InjectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, inject chan os.Signal) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSSSMFaultInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceID list, %v", instanceIDList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on ec2 instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Running SSM command on the instance
for i, ec2ID := range instanceIDList {
//Sending AWS SSM command
log.Info("[Chaos]: Starting the ssm command")
ec2IDList := strings.Fields(ec2ID)
commandId, err := ssm.SendSSMCommand(experimentsDetails, ec2IDList)
if err != nil {
return stacktrace.Propagate(err, "failed to send ssm command")
}
//prepare commands for abort recovery
experimentsDetails.CommandIDs = append(experimentsDetails.CommandIDs, commandId)
//wait for the ssm command to get in running state
log.Info("[Wait]: Waiting for the ssm command to get in InProgress state")
if err := ssm.WaitForCommandStatus("InProgress", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return stacktrace.Propagate(err, "failed to start ssm command")
}
common.SetTargets(ec2ID, "injected", "EC2", chaosDetails)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//wait for the ssm command to get succeeded in the given chaos duration
log.Info("[Wait]: Waiting for the ssm command to get completed")
if err := ssm.WaitForCommandStatus("Success", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return stacktrace.Propagate(err, "failed to send ssm command")
}
common.SetTargets(ec2ID, "reverted", "EC2", chaosDetails)
//Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// InjectChaosInParallelMode will inject the aws ssm chaos in parallel mode that is all at once
func InjectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, inject chan os.Signal) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSSSMFaultInParallelMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceID list, %v", instanceIDList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on ec2 instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Sending AWS SSM command
log.Info("[Chaos]: Starting the ssm command")
commandId, err := ssm.SendSSMCommand(experimentsDetails, instanceIDList)
if err != nil {
return stacktrace.Propagate(err, "failed to send ssm command")
}
//prepare commands for abort recovery
experimentsDetails.CommandIDs = append(experimentsDetails.CommandIDs, commandId)
for _, ec2ID := range instanceIDList {
//wait for the ssm command to get in running state
log.Info("[Wait]: Waiting for the ssm command to get in InProgress state")
if err := ssm.WaitForCommandStatus("InProgress", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return stacktrace.Propagate(err, "failed to start ssm command")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
for _, ec2ID := range instanceIDList {
//wait for the ssm command to get succeeded in the given chaos duration
log.Info("[Wait]: Waiting for the ssm command to get completed")
if err := ssm.WaitForCommandStatus("Success", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return stacktrace.Propagate(err, "failed to send ssm command")
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// AbortWatcher will be watching for the abort signal and revert the chaos
func AbortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, abort chan os.Signal) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
switch {
case len(experimentsDetails.CommandIDs) != 0:
for _, commandId := range experimentsDetails.CommandIDs {
if err := ssm.CancelCommand(commandId, experimentsDetails.Region); err != nil {
log.Errorf("[Abort]: Failed to cancel command, recovery failed: %v", err)
}
}
default:
log.Info("[Abort]: No SSM Command found to cancel")
}
if err := ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region); err != nil {
log.Errorf("Failed to delete ssm document: %v", err)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -1,91 +0,0 @@
package ssm
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"github.com/litmuschaos/litmus-go/chaoslib/litmus/aws-ssm-chaos/lib"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareAWSSSMChaosByID contains the prepration and injection steps for the experiment
func PrepareAWSSSMChaosByID(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSSSMFaultByID")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//create and upload the ssm document on the given aws service monitoring docs
if err = ssm.CreateAndUploadDocument(experimentsDetails.DocumentName, experimentsDetails.DocumentType, experimentsDetails.DocumentFormat, experimentsDetails.DocumentPath, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "could not create and upload the ssm document")
}
experimentsDetails.IsDocsUploaded = true
log.Info("[Info]: SSM docs uploaded successfully")
// watching for the abort signal and revert the chaos
go lib.AbortWatcher(experimentsDetails, abort)
//get the instance id or list of instance ids
instanceIDList := strings.Split(experimentsDetails.EC2InstanceID, ",")
if experimentsDetails.EC2InstanceID == "" || len(instanceIDList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no instance id found for chaos injection"}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = lib.InjectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = lib.InjectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Delete the ssm document on the given aws service monitoring docs
err = ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region)
if err != nil {
return stacktrace.Propagate(err, "failed to delete ssm doc")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}

View File

@ -1,86 +0,0 @@
package ssm
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"github.com/litmuschaos/litmus-go/chaoslib/litmus/aws-ssm-chaos/lib"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
// PrepareAWSSSMChaosByTag contains the prepration and injection steps for the experiment
func PrepareAWSSSMChaosByTag(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSSSMFaultByTag")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//create and upload the ssm document on the given aws service monitoring docs
if err = ssm.CreateAndUploadDocument(experimentsDetails.DocumentName, experimentsDetails.DocumentType, experimentsDetails.DocumentFormat, experimentsDetails.DocumentPath, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "could not create and upload the ssm document")
}
experimentsDetails.IsDocsUploaded = true
log.Info("[Info]: SSM docs uploaded successfully")
// watching for the abort signal and revert the chaos
go lib.AbortWatcher(experimentsDetails, abort)
instanceIDList := common.FilterBasedOnPercentage(experimentsDetails.InstanceAffectedPerc, experimentsDetails.TargetInstanceIDList)
log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceIDList))
if len(instanceIDList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no instance id found for chaos injection"}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = lib.InjectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = lib.InjectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Delete the ssm document on the given aws service monitoring docs
err = ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region)
if err != nil {
return stacktrace.Propagate(err, "failed to delete ssm doc")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}

View File

@ -1,299 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/Azure/azure-sdk-for-go/profiles/latest/compute/mgmt/compute"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/azure/disk-loss/types"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
diskStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/disk"
instanceStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/instance"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareChaos contains the prepration and injection steps for the experiment
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAzureDiskLossFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//get the disk name or list of disk names
diskNameList := strings.Split(experimentsDetails.VirtualDiskNames, ",")
if experimentsDetails.VirtualDiskNames == "" || len(diskNameList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no volume names found to detach"}
}
instanceNamesWithDiskNames, err := diskStatus.GetInstanceNameForDisks(diskNameList, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup)
if err != nil {
return stacktrace.Propagate(err, "error fetching attached instances for disks")
}
// Get the instance name with attached disks
attachedDisksWithInstance := make(map[string]*[]compute.DataDisk)
for instanceName := range instanceNamesWithDiskNames {
attachedDisksWithInstance[instanceName], err = diskStatus.GetInstanceDiskList(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, experimentsDetails.ScaleSet, instanceName)
if err != nil {
return stacktrace.Propagate(err, "error fetching virtual disks")
}
}
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, attachedDisksWithInstance, instanceNamesWithDiskNames, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceNamesWithDiskNames, attachedDisksWithInstance, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceNamesWithDiskNames, attachedDisksWithInstance, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
}
return nil
}
// injectChaosInParallelMode will inject the Azure disk loss chaos in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesWithDiskNames map[string][]string, attachedDisksWithInstance map[string]*[]compute.DataDisk, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureDiskLossFaultInParallelMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure virtual disk"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// Detaching the virtual disks
log.Info("[Chaos]: Detaching the virtual disks from the instances")
for instanceName, diskNameList := range instanceNamesWithDiskNames {
if err = diskStatus.DetachDisks(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameList); err != nil {
return stacktrace.Propagate(err, "failed to detach disks")
}
}
// Waiting for disk to be detached
for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList {
log.Infof("[Wait]: Waiting for Disk '%v' to detach", diskName)
if err := diskStatus.WaitForDiskToDetach(experimentsDetails, diskName); err != nil {
return stacktrace.Propagate(err, "disk detachment check failed")
}
}
}
// Updating the result details
for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList {
common.SetTargets(diskName, "detached", "VirtualDisk", chaosDetails)
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos duration
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
//Attaching the virtual disks to the instance
log.Info("[Chaos]: Attaching the Virtual disks back to the instances")
for instanceName, diskNameList := range attachedDisksWithInstance {
if err = diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameList); err != nil {
return stacktrace.Propagate(err, "virtual disk attachment failed")
}
// Wait for disk to be attached
for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList {
log.Infof("[Wait]: Waiting for Disk '%v' to attach", diskName)
if err := diskStatus.WaitForDiskToAttach(experimentsDetails, diskName); err != nil {
return stacktrace.Propagate(err, "disk attachment check failed")
}
}
}
// Updating the result details
for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList {
common.SetTargets(diskName, "re-attached", "VirtualDisk", chaosDetails)
}
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
// injectChaosInSerialMode will inject the Azure disk loss chaos in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesWithDiskNames map[string][]string, attachedDisksWithInstance map[string]*[]compute.DataDisk, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureDiskLossFaultInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure virtual disks"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for instanceName, diskNameList := range instanceNamesWithDiskNames {
for i, diskName := range diskNameList {
// Converting diskName to list type because DetachDisks() accepts a list type
diskNameToList := []string{diskName}
// Detaching the virtual disks
log.Infof("[Chaos]: Detaching %v from the instance", diskName)
if err = diskStatus.DetachDisks(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameToList); err != nil {
return stacktrace.Propagate(err, "failed to detach disks")
}
// Waiting for disk to be detached
log.Infof("[Wait]: Waiting for Disk '%v' to detach", diskName)
if err := diskStatus.WaitForDiskToDetach(experimentsDetails, diskName); err != nil {
return stacktrace.Propagate(err, "disk detachment check failed")
}
common.SetTargets(diskName, "detached", "VirtualDisk", chaosDetails)
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos duration
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
//Attaching the virtual disks to the instance
log.Infof("[Chaos]: Attaching %v back to the instance", diskName)
if err = diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, attachedDisksWithInstance[instanceName]); err != nil {
return stacktrace.Propagate(err, "disk attachment failed")
}
// Waiting for disk to be attached
log.Infof("[Wait]: Waiting for Disk '%v' to attach", diskName)
if err := diskStatus.WaitForDiskToAttach(experimentsDetails, diskName); err != nil {
return stacktrace.Propagate(err, "disk attachment check failed")
}
common.SetTargets(diskName, "re-attached", "VirtualDisk", chaosDetails)
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
// abortWatcher will be watching for the abort signal and revert the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, attachedDisksWithInstance map[string]*[]compute.DataDisk, instanceNamesWithDiskNames map[string][]string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
log.Info("[Abort]: Attaching disk(s) as abort signal received")
for instanceName, diskList := range attachedDisksWithInstance {
// Checking for provisioning state of the vm instances
err = retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
status, err := instanceStatus.GetAzureInstanceProvisionStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet)
if err != nil {
return stacktrace.Propagate(err, "failed to get instance")
}
if status != "Provisioning succeeded" {
return stacktrace.Propagate(err, "instance is updating, waiting for instance to finish update")
}
return nil
})
if err != nil {
log.Errorf("[Error]: Instance is still in 'updating' state after timeout, re-attach might fail")
}
log.Infof("[Abort]: Attaching disk(s) to instance: %v", instanceName)
for _, disk := range *diskList {
diskStatusString, err := diskStatus.GetDiskStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, *disk.Name)
if err != nil {
log.Errorf("Failed to get disk status: %v", err)
}
if diskStatusString != "Attached" {
if err := diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskList); err != nil {
log.Errorf("Failed to attach disk, manual revert required: %v", err)
} else {
common.SetTargets(*disk.Name, "re-attached", "VirtualDisk", chaosDetails)
}
}
}
}
log.Infof("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -1,293 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/azure/instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
azureCommon "github.com/litmuschaos/litmus-go/pkg/cloud/azure/common"
azureStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/instance"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareAzureStop will initialize instanceNameList and start chaos injection based on sequence method selected
func PrepareAzureStop(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAzureInstanceStopFault")
defer span.End()
// inject channel is used to transmit signal notifications
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// get the instance name or list of instance names
instanceNameList := strings.Split(experimentsDetails.AzureInstanceNames, ",")
if experimentsDetails.AzureInstanceNames == "" || len(instanceNameList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no instance name found to stop"}
}
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, instanceNameList)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceNameList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceNameList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the Azure instance termination in serial mode that is one after the other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureInstanceStopFaultInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceName list, %v", instanceNameList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// PowerOff the instance serially
for i, vmName := range instanceNameList {
// Stopping the Azure instance
log.Infof("[Chaos]: Stopping the Azure instance: %v", vmName)
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "unable to stop the Azure instance")
}
} else {
if err := azureStatus.AzureInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "unable to stop the Azure instance")
}
}
// Wait for Azure instance to completely stop
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the stopped state", vmName)
if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "instance poweroff status check failed")
}
// Run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
// Wait for Chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
// Starting the Azure instance
log.Info("[Chaos]: Starting back the Azure instance")
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "unable to start the Azure instance")
}
} else {
if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "unable to start the Azure instance")
}
}
// Wait for Azure instance to get in running state
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the running state", vmName)
if err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "instance power on status check failed")
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode will inject the Azure instance termination in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureInstanceStopFaultInParallelMode")
defer span.End()
select {
case <-inject:
// Stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceName list, %v", instanceNameList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// PowerOff the instances parallelly
for _, vmName := range instanceNameList {
// Stopping the Azure instance
log.Infof("[Chaos]: Stopping the Azure instance: %v", vmName)
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "unable to stop Azure instance")
}
} else {
if err := azureStatus.AzureInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "unable to stop Azure instance")
}
}
}
// Wait for all Azure instances to completely stop
for _, vmName := range instanceNameList {
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the stopped state", vmName)
if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "instance poweroff status check failed")
}
}
// Run probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
// Wait for Chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
// Starting the Azure instance
for _, vmName := range instanceNameList {
log.Infof("[Chaos]: Starting back the Azure instance: %v", vmName)
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "unable to start the Azure instance")
}
} else {
if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "unable to start the Azure instance")
}
}
}
// Wait for Azure instance to get in running state
for _, vmName := range instanceNameList {
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the running state", vmName)
if err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "instance power on status check failed")
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// watching for the abort signal and revert the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string) {
<-abort
var instanceState string
log.Info("[Abort]: Chaos Revert Started")
for _, vmName := range instanceNameList {
if experimentsDetails.ScaleSet == "enable" {
scaleSetName, vmId := azureCommon.GetScaleSetNameAndInstanceId(vmName)
instanceState, err = azureStatus.GetAzureScaleSetInstanceStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, scaleSetName, vmId)
} else {
instanceState, err = azureStatus.GetAzureInstanceStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName)
}
if err != nil {
log.Errorf("[Abort]: Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "VM running" && instanceState != "VM starting" {
log.Info("[Abort]: Waiting for the Azure instance to get down")
if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
log.Errorf("[Abort]: Instance power off status check failed: %v", err)
}
log.Info("[Abort]: Starting Azure instance as abort signal received")
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
log.Errorf("[Abort]: Unable to start the Azure instance: %v", err)
}
} else {
if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
log.Errorf("[Abort]: Unable to start the Azure instance: %v", err)
}
}
}
log.Info("[Abort]: Waiting for the Azure instance to start")
err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName)
if err != nil {
log.Errorf("[Abort]: Instance power on status check failed: %v", err)
log.Errorf("[Abort]: Azure instance %v failed to start after an abort signal is received", vmName)
}
}
log.Infof("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -1,280 +0,0 @@
package helper
import (
"context"
"fmt"
"os/exec"
"strconv"
"strings"
"time"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/palantir/stacktrace"
"github.com/sirupsen/logrus"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientTypes "k8s.io/apimachinery/pkg/types"
)
var err error
// Helper injects the container-kill chaos
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulateContainerKillFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
resultDetails := types.ResultDetails{}
//Fetching all the ENV passed in the helper pod
log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails)
// Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Initialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
if err := killContainer(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
// killContainer kill the random application container
// it will kill the container till the chaos duration
// the execution will stop after timestamp passes the given chaos duration
func killContainer(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not parse targets")
}
var targets []targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
targets = append(targets, td)
log.Infof("Injecting chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
}
if err := killIterations(targets, experimentsDetails, clients, eventsDetails, chaosDetails, resultDetails); err != nil {
return err
}
log.Infof("[Completion]: %v chaos has been completed", experimentsDetails.ExperimentName)
return nil
}
func killIterations(targets []targetDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
var containerIds []string
for _, t := range targets {
t.RestartCountBefore, err = getRestartCount(t, clients)
if err != nil {
return stacktrace.Propagate(err, "could get container restart count")
}
containerId, err := common.GetContainerID(t.Namespace, t.Name, t.TargetContainer, clients, t.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": t.Name,
"ContainerName": t.TargetContainer,
"RestartCountBefore": t.RestartCountBefore,
})
containerIds = append(containerIds, containerId)
}
if err := kill(experimentsDetails, containerIds, clients, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not kill target container")
}
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaosInterval != 0 {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
}
for _, t := range targets {
if err := validate(t, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not verify restart count")
}
if err := result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "targeted", "pod", t.Name); err != nil {
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
func kill(experimentsDetails *experimentTypes.ExperimentDetails, containerIds []string, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
switch experimentsDetails.ContainerRuntime {
case "docker":
if err := stopDockerContainer(containerIds, experimentsDetails.SocketPath, experimentsDetails.Signal, experimentsDetails.ChaosPodName); err != nil {
if isContextDeadlineExceeded(err) {
return nil
}
return stacktrace.Propagate(err, "could not stop container")
}
case "containerd", "crio":
if err := stopContainerdContainer(containerIds, experimentsDetails.SocketPath, experimentsDetails.Signal, experimentsDetails.ChaosPodName, experimentsDetails.Timeout); err != nil {
if isContextDeadlineExceeded(err) {
return nil
}
return stacktrace.Propagate(err, "could not stop container")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: fmt.Sprintf("unsupported container runtime %s", experimentsDetails.ContainerRuntime)}
}
return nil
}
func validate(t targetDetails, timeout, delay int, clients clients.ClientSets) error {
//Check the status of restarted container
if err := common.CheckContainerStatus(t.Namespace, t.Name, timeout, delay, clients, t.Source); err != nil {
return err
}
// It will verify that the restart count of container should increase after chaos injection
return verifyRestartCount(t, timeout, delay, clients, t.RestartCountBefore)
}
// stopContainerdContainer kill the application container
func stopContainerdContainer(containerIDs []string, socketPath, signal, source string, timeout int) error {
if signal != "SIGKILL" && signal != "SIGTERM" {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: source, Reason: fmt.Sprintf("unsupported signal %s, use either SIGTERM or SIGKILL", signal)}
}
cmd := exec.Command("sudo", "crictl", "-i", fmt.Sprintf("unix://%s", socketPath), "-r", fmt.Sprintf("unix://%s", socketPath), "stop")
if signal == "SIGKILL" {
cmd.Args = append(cmd.Args, "--timeout=0")
} else if timeout != -1 {
cmd.Args = append(cmd.Args, fmt.Sprintf("--timeout=%v", timeout))
}
cmd.Args = append(cmd.Args, containerIDs...)
return common.RunCLICommands(cmd, source, "", "failed to stop container", cerrors.ErrorTypeChaosInject)
}
// stopDockerContainer kill the application container
func stopDockerContainer(containerIDs []string, socketPath, signal, source string) error {
cmd := exec.Command("sudo", "docker", "--host", fmt.Sprintf("unix://%s", socketPath), "kill", "--signal", signal)
cmd.Args = append(cmd.Args, containerIDs...)
return common.RunCLICommands(cmd, source, "", "failed to stop container", cerrors.ErrorTypeChaosInject)
}
// getRestartCount return the restart count of target container
func getRestartCount(target targetDetails, clients clients.ClientSets) (int, error) {
pod, err := clients.GetPod(target.Namespace, target.Name, 180, 2)
if err != nil {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: target.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", target.Name, target.Namespace), Reason: err.Error()}
}
restartCount := 0
for _, container := range pod.Status.ContainerStatuses {
if container.Name == target.TargetContainer {
restartCount = int(container.RestartCount)
break
}
}
return restartCount, nil
}
// verifyRestartCount verify the restart count of target container that it is restarted or not after chaos injection
func verifyRestartCount(t targetDetails, timeout, delay int, clients clients.ClientSets, restartCountBefore int) error {
restartCountAfter := 0
return retry.
Times(uint(timeout / delay)).
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
pod, err := clients.KubeClient.CoreV1().Pods(t.Namespace).Get(context.Background(), t.Name, v1.GetOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
}
for _, container := range pod.Status.ContainerStatuses {
if container.Name == t.TargetContainer {
restartCountAfter = int(container.RestartCount)
break
}
}
if restartCountAfter <= restartCountBefore {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: "target container is not restarted after kill"}
}
log.Infof("restartCount of target container after chaos injection: %v", strconv.Itoa(restartCountAfter))
return nil
})
}
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.Signal = types.Getenv("SIGNAL", "SIGKILL")
experimentDetails.Delay, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_DELAY", "2"))
experimentDetails.Timeout, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_TIMEOUT", "180"))
experimentDetails.ContainerAPITimeout, _ = strconv.Atoi(types.Getenv("CONTAINER_API_TIMEOUT", "-1"))
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
RestartCountBefore int
Source string
}
func isContextDeadlineExceeded(err error) bool {
return strings.Contains(err.Error(), "context deadline exceeded")
}

View File

@ -0,0 +1,298 @@
package main
import (
"fmt"
"os"
"os/exec"
"strconv"
"strings"
"time"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentEnv "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/environment"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/openebs/maya/pkg/util/retry"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientTypes "k8s.io/apimachinery/pkg/types"
)
func main() {
experimentsDetails := experimentTypes.ExperimentDetails{}
clients := clients.ClientSets{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
//Getting kubeConfig and Generate ClientSets
if err := clients.GenerateClientSetFromKubeConfig(); err != nil {
log.Fatalf("Unable to Get the kubeconfig due to %v", err)
}
//Fetching all the ENV passed for the runner pod
log.Info("[PreReq]: Getting the ENV variables")
GetENV(&experimentsDetails, "container-kill")
// Intialise the chaos attributes
experimentEnv.InitialiseChaosVariables(&chaosDetails, &experimentsDetails)
err := KillContainer(&experimentsDetails, clients, &eventsDetails, &chaosDetails)
if err != nil {
log.Fatalf("helper pod failed due to err: %v", err)
}
}
// KillContainer kill the random application container
// it will kill the container till the chaos duration
// the execution will stop after timestamp passes the given chaos duration
func KillContainer(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// getting the current timestamp, it will help to kepp track the total chaos duration
ChaosStartTimeStamp := time.Now().Unix()
for iteration := 0; iteration < experimentsDetails.Iterations; iteration++ {
//Obtain the pod ID of the application pod
podID, err := GetPodID(experimentsDetails)
if err != nil {
return errors.Errorf("Unable to get the pod id %v", err)
}
//GetRestartCount return the restart count of target container
restartCountBefore, err := GetRestartCount(experimentsDetails, experimentsDetails.TargetPod, clients)
if err != nil {
return err
}
//Obtain the container ID through Pod
// this id will be used to select the container for kill
containerID, err := GetContainerID(experimentsDetails, podID)
if err != nil {
return errors.Errorf("Unable to get the container id, %v", err)
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": experimentsDetails.TargetPod,
"ContainerName": experimentsDetails.TargetContainer,
"RestartCountBefore": restartCountBefore,
})
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngne")
}
// killing the application container
StopContainer(containerID)
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaosInterval != 0 {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaosInterval)
waitForChaosInterval(experimentsDetails)
}
//Check the status of restarted container
err = CheckContainerStatus(experimentsDetails, clients, experimentsDetails.TargetPod)
if err != nil {
return errors.Errorf("Application container is not in running state, %v", err)
}
// It will verify that the restart count of container should increase after chaos injection
err = VerifyRestartCount(experimentsDetails, experimentsDetails.TargetPod, clients, restartCountBefore)
if err != nil {
return err
}
// generating the total duration of the experiment run
ChaosCurrentTimeStamp := time.Now().Unix()
chaosDiffTimeStamp := ChaosCurrentTimeStamp - ChaosStartTimeStamp
// terminating the execution after the timestamp exceed the total chaos duration
if int(chaosDiffTimeStamp) >= experimentsDetails.ChaosDuration {
break
}
}
log.Infof("[Completion]: %v chaos has been completed", experimentsDetails.ExperimentName)
return nil
}
//GetPodID derive the pod-id of the application pod
func GetPodID(experimentsDetails *experimentTypes.ExperimentDetails) (string, error) {
cmd := exec.Command("crictl", "pods")
stdout, _ := cmd.Output()
pods := RemoveExtraSpaces(stdout)
for i := 0; i < len(pods)-1; i++ {
attributes := strings.Split(pods[i], " ")
if attributes[3] == experimentsDetails.TargetPod {
return attributes[0], nil
}
}
return "", fmt.Errorf("%v pod is unavailable", experimentsDetails.TargetPod)
}
//GetContainerID derive the container id of the application container
func GetContainerID(experimentsDetails *experimentTypes.ExperimentDetails, podID string) (string, error) {
cmd := exec.Command("crictl", "ps")
stdout, _ := cmd.Output()
containers := RemoveExtraSpaces(stdout)
for i := 0; i < len(containers)-1; i++ {
attributes := strings.Split(containers[i], " ")
if attributes[4] == experimentsDetails.TargetContainer && attributes[6] == podID {
return attributes[0], nil
}
}
return "", fmt.Errorf("%v container is unavailable", experimentsDetails.TargetContainer)
}
//StopContainer kill the application container
func StopContainer(containerID string) {
cmd := exec.Command("crictl", "stop", string(containerID))
stdout, _ := cmd.Output()
fmt.Print(string(stdout))
}
// CheckContainerStatus checks the status of the application container
func CheckContainerStatus(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appName string) error {
err := retry.
Times(90).
Wait(2 * time.Second).
Try(func(attempt uint) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(appName, v1.GetOptions{})
if err != nil {
return errors.Errorf("Unable to list the pod, due to %v", err)
}
for _, container := range pod.Status.ContainerStatuses {
if container.Ready != true {
return errors.Errorf("containers are not yet in running state")
}
log.InfoWithValues("The running status of container are as follows", logrus.Fields{
"container": container.Name, "Pod": pod.Name, "Status": pod.Status.Phase})
}
return nil
})
if err != nil {
return err
}
return nil
}
// RemoveExtraSpaces remove all the extra spaces present in output of crictl commands
func RemoveExtraSpaces(arr []byte) []string {
bytesSlice := make([]byte, len(arr))
index := 0
count := 0
for i := 0; i < len(arr); i++ {
count = 0
for arr[i] == 32 {
count++
i++
if i >= len(arr) {
break
}
}
if count > 1 {
bytesSlice[index] = 32
index++
}
bytesSlice[index] = arr[i]
index++
}
return strings.Split(string(bytesSlice), "\n")
}
//waitForChaosInterval waits for the given ramp time duration (in seconds)
func waitForChaosInterval(experimentsDetails *experimentTypes.ExperimentDetails) {
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
}
//GetRestartCount return the restart count of target container
func GetRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets) (int, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(podName, v1.GetOptions{})
if err != nil {
return 0, err
}
restartCount := 0
for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentsDetails.TargetContainer {
restartCount = int(container.RestartCount)
break
}
}
return restartCount, nil
}
//VerifyRestartCount verify the restart count of target container that it is restarted or not after chaos injection
func VerifyRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets, restartCountBefore int) error {
restartCountAfter := 0
err := retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(podName, v1.GetOptions{})
if err != nil {
return errors.Errorf("Unable to get the application pod, due to %v", err)
}
for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentsDetails.TargetContainer {
restartCountAfter = int(container.RestartCount)
break
}
}
if restartCountAfter <= restartCountBefore {
return errors.Errorf("Target container is not restarted")
}
return nil
})
log.Infof("restartCount of target container after chaos injection: %v", strconv.Itoa(restartCountAfter))
return err
}
//GetENV fetches all the env variables from the runner pod
func GetENV(experimentDetails *experimentTypes.ExperimentDetails, name string) {
experimentDetails.ExperimentName = name
experimentDetails.AppNS = Getenv("APP_NS", "")
experimentDetails.TargetContainer = Getenv("APP_CONTAINER", "")
experimentDetails.TargetPod = Getenv("APP_POD", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.Iterations, _ = strconv.Atoi(Getenv("ITERATIONS", "3"))
experimentDetails.ChaosNamespace = Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = Getenv("CHAOS_ENGINE", "")
experimentDetails.AppLabel = Getenv("APP_LABEL", "")
experimentDetails.ChaosUID = clientTypes.UID(Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = Getenv("POD_NAME", "")
}
// Getenv fetch the env and set the default value, if any
func Getenv(key string, defaultValue string) string {
value := os.Getenv(key)
if value == "" {
value = defaultValue
}
return value
}

View File

@ -1,51 +1,28 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/math"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
"github.com/pkg/errors"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareContainerKill contains the preparation steps before chaos injection
func PrepareContainerKill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareContainerKillFault")
defer span.End()
//PrepareContainerKill contains the prepration steps before chaos injection
func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
var err error
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
//Set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The tunables are:", logrus.Fields{
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
targetPodList, err := common.GetPodList(experimentsDetails.AppNS, experimentsDetails.TargetPod, experimentsDetails.AppLabel, experimentsDetails.PodsAffectedPerc, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
return errors.Errorf("Unable to get the target pod list, err: %v", err)
}
//Waiting for the ramp time before chaos injection
@ -56,30 +33,58 @@ func PrepareContainerKill(ctx context.Context, experimentsDetails *experimentTyp
// Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
err = GetServiceAccount(experimentsDetails, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get experiment service account")
return errors.Errorf("Unable to get the serviceAccountName, err: %v", err)
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = GetTargetContainer(experimentsDetails, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("Unable to get the target container name, err: %v", err)
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
//Getting the iteration count for the container-kill
GetIterations(experimentsDetails)
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("Unable to get annotations, err: %v", err)
}
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
err = CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "app="+experimentsDetails.ExperimentName+"-helper", experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, "app="+experimentsDetails.ExperimentName+"-helper", clients, experimentsDetails.ChaosDuration+experimentsDetails.ChaosInterval+60, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
return errors.Errorf("helper pod failed, err: %v", err)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pods")
err = common.DeleteAllPod("app="+experimentsDetails.ExperimentName+"-helper", experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
//Waiting for the ramp time after chaos injection
@ -90,98 +95,60 @@ func PrepareContainerKill(ctx context.Context, experimentsDetails *experimentTyp
return nil
}
// injectChaosInSerialMode kill the container of all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectContainerKillFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
//GetIterations derive the iterations value from given parameters
func GetIterations(experimentsDetails *experimentTypes.ExperimentDetails) {
var Iterations int
if experimentsDetails.ChaosInterval != 0 {
Iterations = experimentsDetails.ChaosDuration / experimentsDetails.ChaosInterval
}
experimentsDetails.Iterations = math.Maximum(Iterations, 1)
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectContainerKillFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
// GetServiceAccount find the serviceAccountName for the helper pod
func GetServiceAccount(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Get(experimentsDetails.ChaosPodName, v1.GetOptions{})
if err != nil {
return err
}
experimentsDetails.ChaosServiceAccount = pod.Spec.ServiceAccountName
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateContainerKillFaultHelperPod")
defer span.End()
//GetTargetContainer will fetch the container name from application pod
//This container will be used as target container
func GetTargetContainer(experimentsDetails *experimentTypes.ExperimentDetails, appName string, clients clients.ClientSets) (string, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(appName, v1.GetOptions{})
if err != nil {
return "", err
}
return pod.Spec.Containers[0].Name, nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, podName, nodeName, runID string) error {
privilegedEnable := false
if experimentsDetails.ContainerRuntime == "crio" {
privilegedEnable = true
}
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
Name: experimentsDetails.ExperimentName + "-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper",
"name": experimentsDetails.ExperimentName + "-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
Volumes: []apiv1.Volume{
{
Name: "cri-socket",
@ -191,26 +158,37 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
},
},
},
{
Name: "cri-config",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/etc/crictl.yaml",
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
ImagePullPolicy: apiv1.PullAlways,
Command: []string{
"/bin/bash",
},
Args: []string{
"-c",
"./helpers -name container-kill",
"./experiments/container-killer",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(ctx, experimentsDetails, targets),
Env: GetPodEnv(experimentsDetails, podName),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
MountPath: experimentsDetails.SocketPath,
},
{
Name: "cri-config",
MountPath: "/etc/crictl.yaml",
},
},
SecurityContext: &apiv1.SecurityContext{
Privileged: &privilegedEnable,
@ -220,46 +198,49 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}
// GetPodEnv derive all the env required for the helper pod
func GetPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName string) []apiv1.EnvVar {
var envVar []apiv1.EnvVar
ENVList := map[string]string{
"APP_NS": experimentsDetails.AppNS,
"APP_POD": podName,
"APP_CONTAINER": experimentsDetails.TargetContainer,
"TOTAL_CHAOS_DURATION": strconv.Itoa(experimentsDetails.ChaosDuration),
"CHAOS_NAMESPACE": experimentsDetails.ChaosNamespace,
"CHAOS_ENGINE": experimentsDetails.EngineName,
"CHAOS_UID": string(experimentsDetails.ChaosUID),
"CHAOS_INTERVAL": strconv.Itoa(experimentsDetails.ChaosInterval),
"ITERATIONS": strconv.Itoa(experimentsDetails.Iterations),
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
for key, value := range ENVList {
var perEnv apiv1.EnvVar
perEnv.Name = key
perEnv.Value = value
envVar = append(envVar, perEnv)
}
// Getting experiment pod name from downward API
experimentPodName := GetValueFromDownwardAPI("v1", "metadata.name")
var downwardEnv apiv1.EnvVar
downwardEnv.Name = "POD_NAME"
downwardEnv.ValueFrom = &experimentPodName
envVar = append(envVar, downwardEnv)
return nil
return envVar
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
SetEnv("CHAOS_UID", string(experimentsDetails.ChaosUID)).
SetEnv("CHAOS_INTERVAL", strconv.Itoa(experimentsDetails.ChaosInterval)).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("SIGNAL", experimentsDetails.Signal).
SetEnv("STATUS_CHECK_DELAY", strconv.Itoa(experimentsDetails.Delay)).
SetEnv("STATUS_CHECK_TIMEOUT", strconv.Itoa(experimentsDetails.Timeout)).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("CONTAINER_API_TIMEOUT", strconv.Itoa(experimentsDetails.ContainerAPITimeout)).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
}
// SetChaosTunables will setup a random value within a given range of values
// If the value is not provided in range it'll setup the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
// GetValueFromDownwardAPI returns the value from downwardApi
func GetValueFromDownwardAPI(apiVersion string, fieldPath string) apiv1.EnvVarSource {
downwardENV := apiv1.EnvVarSource{
FieldRef: &apiv1.ObjectFieldSelector{
APIVersion: apiVersion,
FieldPath: fieldPath,
},
}
return downwardENV
}

View File

@ -1,374 +0,0 @@
package helper
import (
"context"
"fmt"
"os"
"os/exec"
"os/signal"
"strconv"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/disk-fill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/sirupsen/logrus"
"k8s.io/apimachinery/pkg/api/resource"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientTypes "k8s.io/apimachinery/pkg/types"
)
var inject, abort chan os.Signal
// Helper injects the disk-fill chaos
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulateDiskFillFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
resultDetails := types.ResultDetails{}
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Fetching all the ENV passed in the helper pod
log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails)
// Intialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
if err := diskFill(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
// diskFill contains steps to inject disk-fill chaos
func diskFill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not parse targets")
}
var targets []targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
// Derive the container id of the target container
td.ContainerId, err = common.GetContainerID(td.Namespace, td.Name, td.TargetContainer, clients, chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the pid of the target container
td.TargetPID, err = common.GetPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return err
}
td.SizeToFill, err = getDiskSizeToFill(td, experimentsDetails, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get disk size to fill")
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": td.Name,
"Namespace": td.Namespace,
"SizeToFill(KB)": td.SizeToFill,
"TargetContainer": td.TargetContainer,
})
targets = append(targets, td)
}
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// watching for the abort signal and revert the chaos
go abortWatcher(targets, experimentsDetails, clients, resultDetails.Name)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
for _, t := range targets {
if t.SizeToFill > 0 {
if err := fillDisk(t, experimentsDetails.DataBlockSize); err != nil {
return stacktrace.Propagate(err, "could not fill ephemeral storage")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertDiskFill(t, clients); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
} else {
log.Warn("No required free space found!")
}
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Stopping the experiment")
var errList []string
for _, t := range targets {
// It will delete the target pod if target pod is evicted
// if target pod is still running then it will delete all the files, which was created earlier during chaos execution
if err = revertDiskFill(t, clients); err != nil {
errList = append(errList, err.Error())
continue
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// fillDisk fill the ephemeral disk by creating files
func fillDisk(t targetDetails, bs int) error {
// Creating files to fill the required ephemeral storage size of block size of 4K
log.Infof("[Fill]: Filling ephemeral storage, size: %vKB", t.SizeToFill)
dd := fmt.Sprintf("sudo dd if=/dev/urandom of=/proc/%v/root/home/diskfill bs=%vK count=%v", t.TargetPID, bs, strconv.Itoa(t.SizeToFill/bs))
log.Infof("dd: {%v}", dd)
cmd := exec.Command("/bin/bash", "-c", dd)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(err.Error())
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: string(out)}
}
return nil
}
// getEphemeralStorageAttributes derive the ephemeral storage attributes from the target pod
func getEphemeralStorageAttributes(t targetDetails, clients clients.ClientSets) (int64, error) {
pod, err := clients.GetPod(t.Namespace, t.Name, 180, 2)
if err != nil {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
}
var ephemeralStorageLimit int64
containers := pod.Spec.Containers
// Extracting ephemeral storage limit & requested value from the target container
// It will be in the form of Kb
for _, container := range containers {
if container.Name == t.TargetContainer {
ephemeralStorageLimit = container.Resources.Limits.StorageEphemeral().ToDec().ScaledValue(resource.Kilo)
break
}
}
return ephemeralStorageLimit, nil
}
// filterUsedEphemeralStorage filter out the used ephemeral storage from the given string
func filterUsedEphemeralStorage(ephemeralStorageDetails string) (int, error) {
// Filtering out the ephemeral storage size from the output of du command
// It contains details of all subdirectories of target container
ephemeralStorageAll := strings.Split(ephemeralStorageDetails, "\n")
// It will return the details of main directory
ephemeralStorageAllDiskFill := strings.Split(ephemeralStorageAll[len(ephemeralStorageAll)-2], "\t")[0]
// type casting string to integer
ephemeralStorageSize, err := strconv.Atoi(ephemeralStorageAllDiskFill)
return ephemeralStorageSize, err
}
// getSizeToBeFilled generate the ephemeral storage size need to be filled
func getSizeToBeFilled(experimentsDetails *experimentTypes.ExperimentDetails, usedEphemeralStorageSize int, ephemeralStorageLimit int) int {
var requirementToBeFill int
switch ephemeralStorageLimit {
case 0:
ephemeralStorageMebibytes, _ := strconv.Atoi(experimentsDetails.EphemeralStorageMebibytes)
requirementToBeFill = ephemeralStorageMebibytes * 1024
default:
// deriving size need to be filled from the used size & requirement size to fill
fillPercentage, _ := strconv.Atoi(experimentsDetails.FillPercentage)
requirementToBeFill = (ephemeralStorageLimit * fillPercentage) / 100
}
needToBeFilled := requirementToBeFill - usedEphemeralStorageSize
return needToBeFilled
}
// revertDiskFill will delete the target pod if target pod is evicted
// if target pod is still running then it will delete the files, which was created during chaos execution
func revertDiskFill(t targetDetails, clients clients.ClientSets) error {
pod, err := clients.GetPod(t.Namespace, t.Name, 180, 2)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s,namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
}
podReason := pod.Status.Reason
if podReason == "Evicted" {
// Deleting the pod as pod is already evicted
log.Warn("Target pod is evicted, deleting the pod")
if err := clients.KubeClient.CoreV1().Pods(t.Namespace).Delete(context.Background(), t.Name, v1.DeleteOptions{}); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s,namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("failed to delete target pod after eviction :%s", err.Error())}
}
} else {
// deleting the files after chaos execution
rm := fmt.Sprintf("sudo rm -rf /proc/%v/root/home/diskfill", t.TargetPID)
cmd := exec.Command("/bin/bash", "-c", rm)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(err.Error())
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s,namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("failed to cleanup ephemeral storage: %s", string(out))}
}
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
return nil
}
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.FillPercentage = types.Getenv("FILL_PERCENTAGE", "")
experimentDetails.EphemeralStorageMebibytes = types.Getenv("EPHEMERAL_STORAGE_MEBIBYTES", "")
experimentDetails.DataBlockSize, _ = strconv.Atoi(types.Getenv("DATA_BLOCK_SIZE", "256"))
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targets []targetDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultName string) {
// waiting till the abort signal received
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
for _, t := range targets {
err := revertDiskFill(t, clients)
if err != nil {
log.Errorf("unable to kill disk-fill process, err :%v", err)
continue
}
if err = result.AnnotateChaosResult(resultName, experimentsDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
}
retry--
time.Sleep(1 * time.Second)
}
log.Info("Chaos Revert Completed")
os.Exit(1)
}
func getDiskSizeToFill(t targetDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (int, error) {
usedEphemeralStorageSize, err := getUsedEphemeralStorage(t)
if err != nil {
return 0, stacktrace.Propagate(err, "could not get used ephemeral storage")
}
// GetEphemeralStorageAttributes derive the ephemeral storage attributes from the target container
ephemeralStorageLimit, err := getEphemeralStorageAttributes(t, clients)
if err != nil {
return 0, stacktrace.Propagate(err, "could not get ephemeral storage attributes")
}
if ephemeralStorageLimit == 0 && experimentsDetails.EphemeralStorageMebibytes == "0" {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: "either provide ephemeral storage limit inside target container or define EPHEMERAL_STORAGE_MEBIBYTES ENV"}
}
// deriving the ephemeral storage size to be filled
sizeTobeFilled := getSizeToBeFilled(experimentsDetails, usedEphemeralStorageSize, int(ephemeralStorageLimit))
return sizeTobeFilled, nil
}
func getUsedEphemeralStorage(t targetDetails) (int, error) {
// derive the used ephemeral storage size from the target container
du := fmt.Sprintf("sudo du /proc/%v/root", t.TargetPID)
cmd := exec.Command("/bin/bash", "-c", du)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(err.Error())
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("failed to get used ephemeral storage size: %s", string(out))}
}
ephemeralStorageDetails := string(out)
// filtering out the used ephemeral storage from the output of du command
usedEphemeralStorageSize, err := filterUsedEphemeralStorage(ephemeralStorageDetails)
if err != nil {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("failed to get used ephemeral storage size: %s", err.Error())}
}
log.Infof("used ephemeral storage space: %vKB", strconv.Itoa(usedEphemeralStorageSize))
return usedEphemeralStorageSize, nil
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
ContainerId string
SizeToFill int
TargetPID int
Source string
}

View File

@ -1,56 +1,36 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/disk-fill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareDiskFill contains the preparation steps before chaos injection
func PrepareDiskFill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareDiskFillFault")
defer span.End()
//PrepareDiskFill contains the prepration steps before chaos injection
func PrepareDiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
var err error
// It will contain all the pod & container details required for exec command
// It will contains all the pod & container details required for exec command
execCommandDetails := exec.PodDetails{}
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
//set up the tunables if provided in range
setChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"FillPercentage": experimentsDetails.FillPercentage,
"EphemeralStorageMebibytes": experimentsDetails.EphemeralStorageMebibytes,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
targetPodList, err := common.GetPodList(experimentsDetails.AppNS, experimentsDetails.TargetPod, experimentsDetails.AppLabel, experimentsDetails.PodsAffectedPerc, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
return errors.Errorf("Unable to get the target pod list due to, err: %v", err)
}
//Waiting for the ramp time before chaos injection
@ -59,32 +39,137 @@ func PrepareDiskFill(ctx context.Context, experimentsDetails *experimentTypes.Ex
common.WaitForDuration(experimentsDetails.RampTime)
}
// Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get experiment service account")
}
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// generating the chaos inject event in the chaosengine
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// creating the helper pod to perform disk-fill chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
err = CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "app="+experimentsDetails.ExperimentName+"-helper", experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
for _, pod := range targetPodList.Items {
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = GetTargetContainer(experimentsDetails, pod.Name, clients)
if err != nil {
return errors.Errorf("Unable to get the target container name, err: %v", err)
}
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
// GetEphemeralStorageAttributes derive the ephemeral storage attributes from the target container
ephemeralStorageLimit, ephemeralStorageRequest, err := GetEphemeralStorageAttributes(experimentsDetails, clients, pod.Name)
if err != nil {
return err
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
// Derive the container id of the target container
containerID, err := GetContainerID(experimentsDetails, clients, pod.Name)
if err != nil {
return err
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
"ephemeralStorageLimit": ephemeralStorageLimit,
"ephemeralStorageRequest": ephemeralStorageRequest,
"ContainerID": containerID,
})
// getting the helper pod name, scheduled on the target node
podName, err := GetHelperPodName(pod, clients, experimentsDetails.ChaosNamespace, "app="+experimentsDetails.ExperimentName+"-helper")
if err != nil {
return err
}
// Derive the used ephemeral storage size from the target container
// It will exec inside disk-fill helper pod & derive the used ephemeral storage space
command := "du /diskfill/" + containerID
exec.SetExecCommandAttributes(&execCommandDetails, podName, "disk-fill", experimentsDetails.ChaosNamespace)
ephemeralStorageDetails, err := exec.Exec(&execCommandDetails, clients, strings.Fields(command))
if err != nil {
return errors.Errorf("Unable to get ephemeral storage details, err: %v", err)
}
// filtering out the used ephemeral storage from the output of du command
usedEphemeralStorageSize, err := FilterUsedEphemeralStorage(ephemeralStorageDetails)
if err != nil {
return errors.Errorf("Unable to filter used ephemeral storage size, err: %v", err)
}
log.Infof("used ephemeral storage space: %v", strconv.Itoa(usedEphemeralStorageSize))
// deriving the ephemeral storage size to be filled
sizeTobeFilled := GetSizeToBeFilled(experimentsDetails, usedEphemeralStorageSize, int(ephemeralStorageLimit))
log.Infof("ephemeral storage size to be filled: %v", strconv.Itoa(sizeTobeFilled))
if sizeTobeFilled > 0 {
// Creating files to fill the required ephemeral storage size of block size of 4K
command := "dd if=/dev/urandom of=/diskfill/" + containerID + "/diskfill bs=4K count=" + strconv.Itoa(sizeTobeFilled/4)
_, err = exec.Exec(&execCommandDetails, clients, strings.Fields(command))
if err != nil {
return errors.Errorf("Unable to fill the ephemeral storage, err: %v", err)
}
} else {
log.Warn("No required free space found!, It's Housefull")
}
}
// waiting for the chaos duration
log.Infof("[Wait]: Waiting for the %vs after injecting chaos", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
for _, pod := range targetPodList.Items {
// Derive the container id of the target container
containerID, err := GetContainerID(experimentsDetails, clients, pod.Name)
if err != nil {
return err
}
// getting the helper pod name, scheduled on the target node
podName, err := GetHelperPodName(pod, clients, experimentsDetails.ChaosNamespace, "app="+experimentsDetails.ExperimentName+"-helper")
if err != nil {
return err
}
// It will delete the target pod if target pod is evicted
// if target pod is still running then it will delete all the files, which was created earlier during chaos execution
exec.SetExecCommandAttributes(&execCommandDetails, podName, "disk-fill", experimentsDetails.ChaosNamespace)
err = Remedy(experimentsDetails, clients, containerID, pod.Name, &execCommandDetails)
if err != nil {
return errors.Errorf("Unable to perform remedy operation due to %v", err)
}
}
//Deleting all the helper pod for disk-fill chaos
log.Info("[Cleanup]: Deleting all the helper pod")
err = common.DeleteAllPod("app="+experimentsDetails.ExperimentName+"-helper", experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, %v", err)
}
//Waiting for the ramp time after chaos injection
@ -95,105 +180,42 @@ func PrepareDiskFill(ctx context.Context, experimentsDetails *experimentTypes.Ex
return nil
}
// injectChaosInSerialMode fill the ephemeral storage of all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectDiskFillFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
//GetTargetContainer will fetch the container name from application pod
// It will return the first container name from the application pod
func GetTargetContainer(experimentsDetails *experimentTypes.ExperimentDetails, appName string, clients clients.ClientSets) (string, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(appName, v1.GetOptions{})
if err != nil {
return "", err
}
// creating the helper pod to perform disk-fill chaos
for _, pod := range targetPodList.Items {
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
return pod.Spec.Containers[0].Name, nil
}
// injectChaosInParallelMode fill the ephemeral storage of of all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectDiskFillFaultInParallelMode")
defer span.End()
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appName, appNodeName, runID string) error {
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, appNodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateDiskFillFaultHelperPod")
defer span.End()
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
mountPropagationMode := apiv1.MountPropagationHostToContainer
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
Name: experimentsDetails.ExperimentName + "-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper",
"name": experimentsDetails.ExperimentName + "-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "socket-path",
Name: "udev",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
Path: experimentsDetails.ContainerPath,
},
},
},
@ -202,71 +224,141 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
},
ImagePullPolicy: apiv1.PullAlways,
Args: []string{
"-c",
"./helpers -name disk-fill",
"sleep",
"10000",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "socket-path",
MountPath: experimentsDetails.SocketPath,
Name: "udev",
MountPath: "/diskfill",
MountPropagation: &mountPropagationMode,
},
},
SecurityContext: &apiv1.SecurityContext{
Privileged: &privilegedEnable,
},
},
},
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}
// GetEphemeralStorageAttributes derive the ephemeral storage attributes from the target pod
func GetEphemeralStorageAttributes(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, podName string) (int64, int64, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(podName, v1.GetOptions{})
if err != nil {
return 0, 0, err
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
var ephemeralStorageLimit, ephemeralStorageRequest int64
containers := pod.Spec.Containers
// Extracting ephemeral storage limit & requested value from the target container
// It will be in the form of Kb
for _, container := range containers {
if container.Name == experimentsDetails.TargetContainer {
ephemeralStorageLimit = container.Resources.Limits.StorageEphemeral().ToDec().ScaledValue(resource.Kilo)
ephemeralStorageRequest = container.Resources.Requests.StorageEphemeral().ToDec().ScaledValue(resource.Kilo)
break
}
}
if ephemeralStorageRequest == 0 || ephemeralStorageLimit == 0 {
return 0, 0, fmt.Errorf("No Ephemeral storage details found inside %v container", experimentsDetails.TargetContainer)
}
return ephemeralStorageLimit, ephemeralStorageRequest, nil
}
// GetContainerID derive the container id of the target container
func GetContainerID(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, podName string) (string, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(podName, v1.GetOptions{})
if err != nil {
return "", err
}
var containerID string
containers := pod.Status.ContainerStatuses
// filtering out the container id from the details of containers inside containerStatuses of the given pod
// container id is present in the form of <runtime>://<container-id>
for _, container := range containers {
if container.Name == experimentsDetails.TargetContainer {
containerID = strings.Split(container.ContainerID, "//")[1]
break
}
}
return containerID, nil
}
// FilterUsedEphemeralStorage filter out the used ephemeral storage from the given string
func FilterUsedEphemeralStorage(ephemeralStorageDetails string) (int, error) {
// Filtering out the ephemeral storage size from the output of du command
// It contains details of all subdirectories of target container
ephemeralStorageAll := strings.Split(ephemeralStorageDetails, "\n")
// It will return the details of main directory
ephemeralStorageAllDiskFill := strings.Split(ephemeralStorageAll[len(ephemeralStorageAll)-2], "\t")[0]
// type casting string to interger
ephemeralStorageSize, err := strconv.Atoi(ephemeralStorageAllDiskFill)
return ephemeralStorageSize, err
}
// GetSizeToBeFilled generate the ephemeral storage size need to be filled
func GetSizeToBeFilled(experimentsDetails *experimentTypes.ExperimentDetails, usedEphemeralStorageSize int, ephemeralStorageLimit int) int {
// deriving size need to be filled from the used size & requirement size to fill
requirementToBeFill := (ephemeralStorageLimit * experimentsDetails.FillPercentage) / 100
needToBeFilled := requirementToBeFill - usedEphemeralStorageSize
return needToBeFilled
}
// Remedy will delete the target pod if target pod is evicted
// if target pod is still running then it will delete the files, which was created during chaos execution
func Remedy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, containerID string, podName string, execCommandDetails *exec.PodDetails) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(podName, v1.GetOptions{})
if err != nil {
return err
}
// Deleting the pod as pod is already evicted
podReason := pod.Status.Reason
if podReason == "Evicted" {
if err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(podName, &v1.DeleteOptions{}); err != nil {
return err
}
} else {
// deleting the files after chaos execution
command := "rm -rf /diskfill/" + containerID + "/diskfill"
_, err = exec.Exec(execCommandDetails, clients, strings.Fields(command))
if err != nil {
return errors.Errorf("Unable to delete files to reset ephemeral storage usage due to err: %v", err)
}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
// GetHelperPodName check for the helper pod, which is scheduled on the target node
func GetHelperPodName(targetPod apiv1.Pod, clients clients.ClientSets, namespace, labels string) (string, error) {
var envDetails common.ENVDetails
envDetails.SetEnv("TARGETS", targets).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
SetEnv("CHAOS_UID", string(experimentsDetails.ChaosUID)).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("FILL_PERCENTAGE", experimentsDetails.FillPercentage).
SetEnv("EPHEMERAL_STORAGE_MEBIBYTES", experimentsDetails.EphemeralStorageMebibytes).
SetEnv("DATA_BLOCK_SIZE", strconv.Itoa(experimentsDetails.DataBlockSize)).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
podList, err := clients.KubeClient.CoreV1().Pods(namespace).List(v1.ListOptions{LabelSelector: labels})
return envDetails.ENV
}
// setChaosTunables will setup a random value within a given range of values
// If the value is not provided in range it'll setup the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.FillPercentage = common.ValidateRange(experimentsDetails.FillPercentage)
experimentsDetails.EphemeralStorageMebibytes = common.ValidateRange(experimentsDetails.EphemeralStorageMebibytes)
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
if err != nil || len(podList.Items) == 0 {
return "", errors.Errorf("Unable to list the helper pods, %v", err)
}
for _, pod := range podList.Items {
if pod.Spec.NodeName == targetPod.Spec.NodeName {
return pod.Name, nil
}
}
return "", errors.Errorf("No helper pod is available on %v node", targetPod.Spec.NodeName)
}

View File

@ -1,180 +0,0 @@
package lib
import (
"context"
"fmt"
"strconv"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/docker-service-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareDockerServiceKill contains prepration steps before chaos injection
func PrepareDockerServiceKill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareDockerServiceKillFault")
defer span.End()
var err error
if experimentsDetails.TargetNode == "" {
//Select node for docker-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get node name")
}
}
log.InfoWithValues("[Info]: Details of node under chaos injection", logrus.Fields{
"NodeName": experimentsDetails.TargetNode,
})
experimentsDetails.RunID = stringutils.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + experimentsDetails.TargetNode + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
// Creating the helper pod to perform docker-service-kill
if err = createHelperPod(ctx, experimentsDetails, clients, chaosDetails, experimentsDetails.TargetNode); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateDockerServiceKillFaultHelperPod")
defer span.End()
privileged := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
Volumes: []apiv1.Volume{
{
Name: "bus",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/var/run",
},
},
},
{
Name: "root",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/",
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
},
Args: []string{
"-c",
"sleep 10 && systemctl stop docker && sleep " + strconv.Itoa(experimentsDetails.ChaosDuration) + " && systemctl start docker",
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "bus",
MountPath: "/var/run",
},
{
Name: "root",
MountPath: "/node",
},
},
SecurityContext: &apiv1.SecurityContext{
Privileged: &privileged,
},
TTY: true,
},
},
Tolerations: []apiv1.Toleration{
{
Key: "node.kubernetes.io/not-ready",
Operator: apiv1.TolerationOperator("Exists"),
Effect: apiv1.TaintEffect("NoExecute"),
TolerationSeconds: ptrint64(int64(experimentsDetails.ChaosDuration) + 60),
},
{
Key: "node.kubernetes.io/unreachable",
Operator: apiv1.TolerationOperator("Exists"),
Effect: apiv1.TaintEffect("NoExecute"),
TolerationSeconds: ptrint64(int64(experimentsDetails.ChaosDuration) + 60),
},
},
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
func ptrint64(p int64) *int64 {
return &p
}

View File

@ -1,83 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
ebsloss "github.com/litmuschaos/litmus-go/chaoslib/litmus/ebs-loss/lib"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareEBSLossByID contains the prepration and injection steps for the experiment
func PrepareEBSLossByID(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEBSLossFaultByID")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//get the volume id or list of instance ids
volumeIDList := strings.Split(experimentsDetails.EBSVolumeID, ",")
if len(volumeIDList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no volume id found to detach"}
}
// watching for the abort signal and revert the chaos
go ebsloss.AbortWatcher(experimentsDetails, volumeIDList, abort, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = ebsloss.InjectChaosInSerialMode(ctx, experimentsDetails, volumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = ebsloss.InjectChaosInParallelMode(ctx, experimentsDetails, volumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
}
return nil
}

View File

@ -1,80 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
ebsloss "github.com/litmuschaos/litmus-go/chaoslib/litmus/ebs-loss/lib"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareEBSLossByTag contains the prepration and injection steps for the experiment
func PrepareEBSLossByTag(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEBSLossFaultByTag")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
targetEBSVolumeIDList := common.FilterBasedOnPercentage(experimentsDetails.VolumeAffectedPerc, experimentsDetails.TargetVolumeIDList)
log.Infof("[Chaos]:Number of volumes targeted: %v", len(targetEBSVolumeIDList))
// watching for the abort signal and revert the chaos
go ebsloss.AbortWatcher(experimentsDetails, targetEBSVolumeIDList, abort, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = ebsloss.InjectChaosInSerialMode(ctx, experimentsDetails, targetEBSVolumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = ebsloss.InjectChaosInParallelMode(ctx, experimentsDetails, targetEBSVolumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
}
return nil
}

View File

@ -1,239 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
ebs "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ebs"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
// InjectChaosInSerialMode will inject the ebs loss chaos in serial mode which means one after other
func InjectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetEBSVolumeIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEBSLossFaultInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on ec2 instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i, volumeID := range targetEBSVolumeIDList {
//Get volume attachment details
ec2InstanceID, device, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region)
if err != nil {
return stacktrace.Propagate(err, "failed to get the attachment info")
}
//Detaching the ebs volume from the instance
log.Info("[Chaos]: Detaching the EBS volume from the instance")
if err = ebs.EBSVolumeDetach(volumeID, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ebs detachment failed")
}
common.SetTargets(volumeID, "injected", "EBS", chaosDetails)
//Wait for ebs volume detachment
log.Infof("[Wait]: Wait for EBS volume detachment for volume %v", volumeID)
if err = ebs.WaitForVolumeDetachment(volumeID, ec2InstanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "ebs detachment failed")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos duration
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
//Getting the EBS volume attachment status
ebsState, err := ebs.GetEBSStatus(volumeID, ec2InstanceID, experimentsDetails.Region)
if err != nil {
return stacktrace.Propagate(err, "failed to get the ebs status")
}
switch ebsState {
case "attached":
log.Info("[Skip]: The EBS volume is already attached")
default:
//Attaching the ebs volume from the instance
log.Info("[Chaos]: Attaching the EBS volume back to the instance")
if err = ebs.EBSVolumeAttach(volumeID, ec2InstanceID, device, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ebs attachment failed")
}
//Wait for ebs volume attachment
log.Infof("[Wait]: Wait for EBS volume attachment for %v volume", volumeID)
if err = ebs.WaitForVolumeAttachment(volumeID, ec2InstanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "ebs attachment failed")
}
}
common.SetTargets(volumeID, "reverted", "EBS", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
// InjectChaosInParallelMode will inject the chaos in parallel mode that means all at once
func InjectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetEBSVolumeIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEBSLossFaultInParallelMode")
defer span.End()
var ec2InstanceIDList, deviceList []string
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on ec2 instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//prepare the instaceIDs and device name for all the given volume
for _, volumeID := range targetEBSVolumeIDList {
ec2InstanceID, device, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region)
if err != nil {
return stacktrace.Propagate(err, "failed to get the attachment info")
}
if ec2InstanceID == "" || device == "" {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: "Volume not attached to any instance",
Target: fmt.Sprintf("EBS Volume ID: %v", volumeID),
}
}
ec2InstanceIDList = append(ec2InstanceIDList, ec2InstanceID)
deviceList = append(deviceList, device)
}
for _, volumeID := range targetEBSVolumeIDList {
//Detaching the ebs volume from the instance
log.Info("[Chaos]: Detaching the EBS volume from the instance")
if err := ebs.EBSVolumeDetach(volumeID, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ebs detachment failed")
}
common.SetTargets(volumeID, "injected", "EBS", chaosDetails)
}
log.Info("[Info]: Checking if the detachment process initiated")
if err := ebs.CheckEBSDetachmentInitialisation(targetEBSVolumeIDList, ec2InstanceIDList, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "failed to initialise the detachment")
}
for i, volumeID := range targetEBSVolumeIDList {
//Wait for ebs volume detachment
log.Infof("[Wait]: Wait for EBS volume detachment for volume %v", volumeID)
if err := ebs.WaitForVolumeDetachment(volumeID, ec2InstanceIDList[i], experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "ebs detachment failed")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
for i, volumeID := range targetEBSVolumeIDList {
//Getting the EBS volume attachment status
ebsState, err := ebs.GetEBSStatus(volumeID, ec2InstanceIDList[i], experimentsDetails.Region)
if err != nil {
return stacktrace.Propagate(err, "failed to get the ebs status")
}
switch ebsState {
case "attached":
log.Info("[Skip]: The EBS volume is already attached")
default:
//Attaching the ebs volume from the instance
log.Info("[Chaos]: Attaching the EBS volume from the instance")
if err = ebs.EBSVolumeAttach(volumeID, ec2InstanceIDList[i], deviceList[i], experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ebs attachment failed")
}
//Wait for ebs volume attachment
log.Infof("[Wait]: Wait for EBS volume attachment for volume %v", volumeID)
if err = ebs.WaitForVolumeAttachment(volumeID, ec2InstanceIDList[i], experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "ebs attachment failed")
}
}
common.SetTargets(volumeID, "reverted", "EBS", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
// AbortWatcher will watching for the abort signal and revert the chaos
func AbortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, volumeIDList []string, abort chan os.Signal, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for _, volumeID := range volumeIDList {
//Get volume attachment details
instanceID, deviceName, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region)
if err != nil {
log.Errorf("Failed to get the attachment info: %v", err)
}
//Getting the EBS volume attachment status
ebsState, err := ebs.GetEBSStatus(experimentsDetails.EBSVolumeID, instanceID, experimentsDetails.Region)
if err != nil {
log.Errorf("Failed to get the ebs status when an abort signal is received: %v", err)
}
if ebsState != "attached" {
//Wait for ebs volume detachment
//We first wait for the volume to get in detached state then we are attaching it.
log.Info("[Abort]: Wait for EBS complete volume detachment")
if err = ebs.WaitForVolumeDetachment(experimentsDetails.EBSVolumeID, instanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
log.Errorf("Unable to detach the ebs volume: %v", err)
}
//Attaching the ebs volume from the instance
log.Info("[Chaos]: Attaching the EBS volume from the instance")
err = ebs.EBSVolumeAttach(experimentsDetails.EBSVolumeID, instanceID, deviceName, experimentsDetails.Region)
if err != nil {
log.Errorf("EBS attachment failed when an abort signal is received: %v", err)
}
}
common.SetTargets(volumeID, "reverted", "EBS", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -1,265 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ec2-terminate-by-id/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareEC2TerminateByID contains the prepration and injection steps for the experiment
func PrepareEC2TerminateByID(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEC2TerminateFaultByID")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//get the instance id or list of instance ids
instanceIDList := strings.Split(experimentsDetails.Ec2InstanceID, ",")
if experimentsDetails.Ec2InstanceID == "" || len(instanceIDList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no EC2 instance ID found to terminate"}
}
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, instanceIDList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the ec2 instance termination in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByIDInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceID list, %v", instanceIDList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on ec2 instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//PowerOff the instance
for i, id := range instanceIDList {
//Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "injected", "EC2", chaosDetails)
//Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
//Starting the EC2 instance
if experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
//Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
common.SetTargets(id, "reverted", "EC2", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode will inject the ec2 instance termination in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByIDInParallelMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceID list, %v", instanceIDList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on ec2 instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//PowerOff the instance
for _, id := range instanceIDList {
//Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "injected", "EC2", chaosDetails)
}
for _, id := range instanceIDList {
//Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "reverted", "EC2 Instance ID", chaosDetails)
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
//Starting the EC2 instance
if experimentsDetails.ManagedNodegroup != "enable" {
for _, id := range instanceIDList {
log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
for _, id := range instanceIDList {
//Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
}
for _, id := range instanceIDList {
common.SetTargets(id, "reverted", "EC2", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// watching for the abort signal and revert the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for _, id := range instanceIDList {
instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region)
if err != nil {
log.Errorf("Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "running" && experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Abort]: Waiting for the EC2 instance to get down")
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
log.Errorf("Unable to wait till stop of the instance: %v", err)
}
log.Info("[Abort]: Starting EC2 instance as abort signal received")
err := awslib.EC2Start(id, experimentsDetails.Region)
if err != nil {
log.Errorf("EC2 instance failed to start when an abort signal is received: %v", err)
}
}
common.SetTargets(id, "reverted", "EC2", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -1,296 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ec2-terminate-by-tag/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
var inject, abort chan os.Signal
// PrepareEC2TerminateByTag contains the prepration and injection steps for the experiment
func PrepareEC2TerminateByTag(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEC2TerminateFaultByTag")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
instanceIDList := common.FilterBasedOnPercentage(experimentsDetails.InstanceAffectedPerc, experimentsDetails.TargetInstanceIDList)
log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceIDList))
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, instanceIDList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the ce2 instance termination in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByTagInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceID list, %v", instanceIDList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on ec2 instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//PowerOff the instance
for i, id := range instanceIDList {
//Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "injected", "EC2", chaosDetails)
//Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
//Starting the EC2 instance
if experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
//Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
common.SetTargets(id, "reverted", "EC2", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode will inject the ce2 instance termination in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByTagInParallelMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceID list, %v", instanceIDList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on ec2 instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//PowerOff the instance
for _, id := range instanceIDList {
//Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "injected", "EC2", chaosDetails)
}
for _, id := range instanceIDList {
//Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
//Starting the EC2 instance
if experimentsDetails.ManagedNodegroup != "enable" {
for _, id := range instanceIDList {
log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
for _, id := range instanceIDList {
//Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
}
for _, id := range instanceIDList {
common.SetTargets(id, "reverted", "EC2", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// SetTargetInstance will select the target instance which are in running state and filtered from the given instance tag
func SetTargetInstance(experimentsDetails *experimentTypes.ExperimentDetails) error {
instanceIDList, err := awslib.GetInstanceList(experimentsDetails.Ec2InstanceTag, experimentsDetails.Region)
if err != nil {
return stacktrace.Propagate(err, "failed to get the instance id list")
}
if len(instanceIDList) == 0 {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeTargetSelection,
Reason: fmt.Sprintf("no instance found with the given tag %v, in region %v", experimentsDetails.Ec2InstanceTag, experimentsDetails.Region),
}
}
for _, id := range instanceIDList {
instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region)
if err != nil {
return stacktrace.Propagate(err, "failed to get the instance status while selecting the target instances")
}
if instanceState == "running" {
experimentsDetails.TargetInstanceIDList = append(experimentsDetails.TargetInstanceIDList, id)
}
}
if len(experimentsDetails.TargetInstanceIDList) == 0 {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: "failed to get any running instance",
Target: fmt.Sprintf("EC2 Instance Tag: %v", experimentsDetails.Ec2InstanceTag)}
}
log.InfoWithValues("[Info]: Targeting the running instances filtered from instance tag", logrus.Fields{
"Total number of instances filtered": len(instanceIDList),
"Number of running instances filtered": len(experimentsDetails.TargetInstanceIDList),
})
return nil
}
// watching for the abort signal and revert the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for _, id := range instanceIDList {
instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region)
if err != nil {
log.Errorf("Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "running" && experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Abort]: Waiting for the EC2 instance to get down")
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
log.Errorf("Unable to wait till stop of the instance: %v", err)
}
log.Info("[Abort]: Starting EC2 instance as abort signal received")
err := awslib.EC2Start(id, experimentsDetails.Region)
if err != nil {
log.Errorf("EC2 instance failed to start when an abort signal is received: %v", err)
}
}
common.SetTargets(id, "reverted", "EC2", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -1,312 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-disk-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareDiskVolumeLossByLabel contains the prepration and injection steps for the experiment
func PrepareDiskVolumeLossByLabel(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareGCPDiskVolumeLossFaultByLabel")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
diskVolumeNamesList := common.FilterBasedOnPercentage(experimentsDetails.DiskAffectedPerc, experimentsDetails.TargetDiskVolumeNamesList)
if err := getDeviceNamesAndVMInstanceNames(diskVolumeNamesList, computeService, experimentsDetails); err != nil {
return err
}
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// watching for the abort signal and revert the chaos
go abortWatcher(computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, abort, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the disk loss chaos in serial mode which means one after the other
func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPDiskVolumeLossFaultByLabelInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on VM instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i := range targetDiskVolumeNamesList {
//Detaching the disk volume from the instance
log.Info("[Chaos]: Detaching the disk volume from the instance")
if err = gcp.DiskVolumeDetach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk detachment failed")
}
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
//Wait for disk volume detachment
log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to detach the disk volume from the vm instance")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
//Wait for chaos duration
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone)
if err != nil {
return stacktrace.Propagate(err, "failed to get the disk volume status")
}
switch diskState {
case "attached":
log.Info("[Skip]: The disk volume is already attached")
default:
//Attaching the disk volume to the instance
log.Info("[Chaos]: Attaching the disk volume back to the instance")
if err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk attachment failed")
}
//Wait for disk volume attachment
log.Infof("[Wait]: Wait for disk volume attachment for %v volume", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to attach the disk volume to the vm instance")
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
// injectChaosInParallelMode will inject the disk loss chaos in parallel mode that means all at once
func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPDiskVolumeLossFaultByLabelInParallelMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on vm instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i := range targetDiskVolumeNamesList {
//Detaching the disk volume from the instance
log.Info("[Chaos]: Detaching the disk volume from the instance")
if err = gcp.DiskVolumeDetach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk detachment failed")
}
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
}
for i := range targetDiskVolumeNamesList {
//Wait for disk volume detachment
log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to detach the disk volume from the vm instance")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
for i := range targetDiskVolumeNamesList {
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone)
if err != nil {
return stacktrace.Propagate(err, "failed to get the disk status")
}
switch diskState {
case "attached":
log.Info("[Skip]: The disk volume is already attached")
default:
//Attaching the disk volume to the instance
log.Info("[Chaos]: Attaching the disk volume to the instance")
if err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk attachment failed")
}
//Wait for disk volume attachment
log.Infof("[Wait]: Wait for disk volume attachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to attach the disk volume to the vm instance")
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
// AbortWatcher will watching for the abort signal and revert the chaos
func abortWatcher(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, abort chan os.Signal, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for i := range targetDiskVolumeNamesList {
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone)
if err != nil {
log.Errorf("Failed to get %s disk state when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
}
if diskState != "attached" {
//Wait for disk volume detachment
//We first wait for the volume to get in detached state then we are attaching it.
log.Infof("[Abort]: Wait for %s complete disk volume detachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
log.Errorf("Unable to detach %s disk volume, err: %v", targetDiskVolumeNamesList[i], err)
}
//Attaching the disk volume from the instance
log.Infof("[Chaos]: Attaching %s disk volume to the instance", targetDiskVolumeNamesList[i])
err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i])
if err != nil {
log.Errorf("%s disk attachment failed when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}
// getDeviceNamesAndVMInstanceNames fetches the device name and attached VM instance name for each target disk
func getDeviceNamesAndVMInstanceNames(diskVolumeNamesList []string, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails) error {
for i := range diskVolumeNamesList {
instanceName, err := gcp.GetVolumeAttachmentDetails(computeService, experimentsDetails.GCPProjectID, experimentsDetails.Zones, diskVolumeNamesList[i])
if err != nil || instanceName == "" {
return stacktrace.Propagate(err, "failed to get the disk attachment info")
}
deviceName, err := gcp.GetDiskDeviceNameForVM(computeService, diskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones, instanceName)
if err != nil {
return stacktrace.Propagate(err, "failed to fetch the disk device name")
}
experimentsDetails.TargetDiskInstanceNamesList = append(experimentsDetails.TargetDiskInstanceNamesList, instanceName)
experimentsDetails.DeviceNamesList = append(experimentsDetails.DeviceNamesList, deviceName)
}
return nil
}

View File

@ -1,303 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-disk-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"github.com/pkg/errors"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareDiskVolumeLoss contains the prepration and injection steps for the experiment
func PrepareDiskVolumeLoss(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareVMDiskLossFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//get the disk volume names list
diskNamesList := strings.Split(experimentsDetails.DiskVolumeNames, ",")
//get the disk zones list
diskZonesList := strings.Split(experimentsDetails.Zones, ",")
//get the device names for the given disks
if err := getDeviceNamesList(computeService, experimentsDetails, diskNamesList, diskZonesList); err != nil {
return stacktrace.Propagate(err, "failed to fetch the disk device names")
}
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// watching for the abort signal and revert the chaos
go abortWatcher(computeService, experimentsDetails, diskNamesList, diskZonesList, abort, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, computeService, experimentsDetails, diskNamesList, diskZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, computeService, experimentsDetails, diskNamesList, diskZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the disk loss chaos in serial mode which means one after the other
func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMDiskLossFaultInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on VM instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i := range targetDiskVolumeNamesList {
//Detaching the disk volume from the instance
log.Infof("[Chaos]: Detaching %s disk volume from the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeDetach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk detachment failed")
}
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
//Wait for disk volume detachment
log.Infof("[Wait]: Wait for %s disk volume detachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to detach disk volume from the vm instance")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
//Wait for chaos duration
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i])
if err != nil {
return stacktrace.Propagate(err, fmt.Sprintf("failed to get %s disk volume status", targetDiskVolumeNamesList[i]))
}
switch diskState {
case "attached":
log.Infof("[Skip]: %s disk volume is already attached", targetDiskVolumeNamesList[i])
default:
//Attaching the disk volume to the instance
log.Infof("[Chaos]: Attaching %s disk volume back to the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk attachment failed")
}
//Wait for disk volume attachment
log.Infof("[Wait]: Wait for %s disk volume attachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to attach disk volume to the vm instance")
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
// injectChaosInParallelMode will inject the disk loss chaos in parallel mode that means all at once
func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMDiskLossFaultInParallelMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on vm instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i := range targetDiskVolumeNamesList {
//Detaching the disk volume from the instance
log.Infof("[Chaos]: Detaching %s disk volume from the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeDetach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk detachment failed")
}
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
}
for i := range targetDiskVolumeNamesList {
//Wait for disk volume detachment
log.Infof("[Wait]: Wait for %s disk volume detachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to detach disk volume from the vm instance")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
for i := range targetDiskVolumeNamesList {
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i])
if err != nil {
return errors.Errorf("failed to get the disk status, err: %v", err)
}
switch diskState {
case "attached":
log.Infof("[Skip]: %s disk volume is already attached", targetDiskVolumeNamesList[i])
default:
//Attaching the disk volume to the instance
log.Infof("[Chaos]: Attaching %s disk volume to the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk attachment failed")
}
//Wait for disk volume attachment
log.Infof("[Wait]: Wait for %s disk volume attachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to attach disk volume to the vm instance")
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
// AbortWatcher will watching for the abort signal and revert the chaos
func abortWatcher(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, abort chan os.Signal, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for i := range targetDiskVolumeNamesList {
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i])
if err != nil {
log.Errorf("Failed to get %s disk state when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
}
if diskState != "attached" {
//Wait for disk volume detachment
//We first wait for the volume to get in detached state then we are attaching it.
log.Infof("[Abort]: Wait for complete disk volume detachment for %s", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
log.Errorf("Unable to detach %s disk volume, err: %v", targetDiskVolumeNamesList[i], err)
}
//Attaching the disk volume from the instance
log.Infof("[Chaos]: Attaching %s disk volume from the instance", targetDiskVolumeNamesList[i])
err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i])
if err != nil {
log.Errorf("%s disk attachment failed when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}
// getDeviceNamesList fetches the device names for the target disks
func getDeviceNamesList(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, diskNamesList, diskZonesList []string) error {
for i := range diskNamesList {
deviceName, err := gcp.GetDiskDeviceNameForVM(computeService, diskNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.TargetDiskInstanceNamesList[i])
if err != nil {
return err
}
experimentsDetails.DeviceNamesList = append(experimentsDetails.DeviceNamesList, deviceName)
}
return nil
}

View File

@ -1,293 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
gcplib "github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1"
)
var inject, abort chan os.Signal
// PrepareVMStopByLabel executes the experiment steps by injecting chaos into target VM instances
func PrepareVMStopByLabel(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareGCPVMInstanceStopFaultByLabel")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
instanceNamesList := common.FilterBasedOnPercentage(experimentsDetails.InstanceAffectedPerc, experimentsDetails.TargetVMInstanceNameList)
log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceNamesList))
// watching for the abort signal and revert the chaos
go abortWatcher(computeService, experimentsDetails, instanceNamesList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(ctx, computeService, experimentsDetails, instanceNamesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(ctx, computeService, experimentsDetails, instanceNamesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stops VM instances in serial mode i.e. one after the other
func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPVMInstanceStopFaultByLabelInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target VM instance list, %v", instanceNamesList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Stop the instance
for i := range instanceNamesList {
//Stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "VM instance failed to stop")
}
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
//Wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to stop", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// wait for the chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
switch experimentsDetails.ManagedInstanceGroup {
case "enable":
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in RUNNING state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "unable to start %s vm instance")
}
default:
// starting the VM instance
log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "vm instance failed to start")
}
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in RUNNING state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "unable to start %s vm instance")
}
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode will inject the VM instance termination in serial mode that is one after other
func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPVMInstanceStopFaultByLabelInParallelMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target VM instance list, %v", instanceNamesList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// power-off the instance
for i := range instanceNamesList {
// stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "vm instance failed to stop")
}
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
}
for i := range instanceNamesList {
// wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
switch experimentsDetails.ManagedInstanceGroup {
case "enable":
// wait for VM instance to get in running state
for i := range instanceNamesList {
log.Infof("[Wait]: Wait for VM instance '%v' to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "unable to start the vm instance")
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
default:
// starting the VM instance
for i := range instanceNamesList {
log.Info("[Chaos]: Starting back the VM instance")
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "vm instance failed to start")
}
}
// wait for VM instance to get in running state
for i := range instanceNamesList {
log.Infof("[Wait]: Wait for VM instance '%v' to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "unable to start the vm instance")
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// abortWatcher watches for the abort signal and reverts the chaos
func abortWatcher(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for i := range instanceNamesList {
instanceState, err := gcplib.GetVMInstanceStatus(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones)
if err != nil {
log.Errorf("Failed to get %s instance status when an abort signal is received, err: %v", instanceNamesList[i], err)
}
if instanceState != "RUNNING" && experimentsDetails.ManagedInstanceGroup != "enable" {
log.Info("[Abort]: Waiting for the VM instance to shut down")
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
log.Errorf("Unable to wait till stop of %s instance, err: %v", instanceNamesList[i], err)
}
log.Info("[Abort]: Starting VM instance as abort signal received")
err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones)
if err != nil {
log.Errorf("%s instance failed to start when an abort signal is received, err: %v", instanceNamesList[i], err)
}
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -1,304 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
gcplib "github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareVMStop contains the prepration and injection steps for the experiment
func PrepareVMStop(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareVMInstanceStopFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// get the instance name or list of instance names
instanceNamesList := strings.Split(experimentsDetails.VMInstanceName, ",")
// get the zone name or list of corresponding zones for the instances
instanceZonesList := strings.Split(experimentsDetails.Zones, ",")
go abortWatcher(computeService, experimentsDetails, instanceNamesList, instanceZonesList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, computeService, experimentsDetails, instanceNamesList, instanceZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, computeService, experimentsDetails, instanceNamesList, instanceZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// wait for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stops VM instances in serial mode i.e. one after the other
func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, instanceZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMInstanceStopFaultInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instance list, %v", instanceNamesList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Stop the instance
for i := range instanceNamesList {
//Stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to stop")
}
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
//Wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// wait for the chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
switch experimentsDetails.ManagedInstanceGroup {
case "disable":
// starting the VM instance
log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to start")
}
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "unable to start vm instance")
}
default:
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "unable to start vm instance")
}
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode stops VM instances in parallel mode i.e. all at once
func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, instanceZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMInstanceStopFaultInParallelMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target VM instance list, %v", instanceNamesList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// power-off the instance
for i := range instanceNamesList {
// stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to stop")
}
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
}
for i := range instanceNamesList {
// wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
switch experimentsDetails.ManagedInstanceGroup {
case "disable":
// starting the VM instance
for i := range instanceNamesList {
log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to start")
}
}
// wait for VM instance to get in running state
for i := range instanceNamesList {
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "unable to start vm instance")
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
default:
// wait for VM instance to get in running state
for i := range instanceNamesList {
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "unable to start vm instance")
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// abortWatcher watches for the abort signal and reverts the chaos
func abortWatcher(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, zonesList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
if experimentsDetails.ManagedInstanceGroup != "enable" {
for i := range instanceNamesList {
instanceState, err := gcplib.GetVMInstanceStatus(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i])
if err != nil {
log.Errorf("Failed to get %s vm instance status when an abort signal is received, err: %v", instanceNamesList[i], err)
}
if instanceState != "RUNNING" {
log.Infof("[Abort]: Waiting for %s VM instance to shut down", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i]); err != nil {
log.Errorf("Unable to wait till stop of %s instance, err: %v", instanceNamesList[i], err)
}
log.Infof("[Abort]: Starting %s VM instance as abort signal is received", instanceNamesList[i])
err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i])
if err != nil {
log.Errorf("%s VM instance failed to start when an abort signal is received, err: %v", instanceNamesList[i], err)
}
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -1,332 +0,0 @@
package helper
import (
"context"
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"os"
"os/signal"
"strconv"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
clientTypes "k8s.io/apimachinery/pkg/types"
)
var (
err error
inject, abort chan os.Signal
)
// Helper injects the http chaos
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodHTTPFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
resultDetails := types.ResultDetails{}
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Fetching all the ENV passed for the helper pod
log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails)
// Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Initialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
err := prepareK8sHttpChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails)
if err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
// prepareK8sHttpChaos contains the preparation steps before chaos injection
func prepareK8sHttpChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not parse targets")
}
var targets []targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
td.ContainerId, err = common.GetRuntimeBasedContainerID(experimentsDetails.ContainerRuntime, experimentsDetails.SocketPath, td.Name, td.Namespace, td.TargetContainer, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the pid of the target container
td.Pid, err = common.GetPauseAndSandboxPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
}
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos
go abortWatcher(targets, resultDetails.Name, chaosDetails.ChaosNamespace, experimentsDetails)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
for _, t := range targets {
// injecting http chaos inside target container
if err = injectChaos(experimentsDetails, t); err != nil {
return stacktrace.Propagate(err, "could not inject chaos")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertChaos(experimentsDetails, t); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: chaos duration is over, reverting chaos")
var errList []string
for _, t := range targets {
// cleaning the ip rules process after chaos injection
err := revertChaos(experimentsDetails, t)
if err != nil {
errList = append(errList, err.Error())
continue
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// injectChaos inject the http chaos in target container and add ruleset to the iptables to redirect the ports
func injectChaos(experimentDetails *experimentTypes.ExperimentDetails, t targetDetails) error {
if err := startProxy(experimentDetails, t.Pid); err != nil {
killErr := killProxy(t.Pid, t.Source)
if killErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(killErr).Error())}
}
return stacktrace.Propagate(err, "could not start proxy server")
}
if err := addIPRuleSet(experimentDetails, t.Pid); err != nil {
killErr := killProxy(t.Pid, t.Source)
if killErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(killErr).Error())}
}
return stacktrace.Propagate(err, "could not add ip rules")
}
return nil
}
// revertChaos revert the http chaos in target container
func revertChaos(experimentDetails *experimentTypes.ExperimentDetails, t targetDetails) error {
var errList []string
if err := removeIPRuleSet(experimentDetails, t.Pid); err != nil {
errList = append(errList, err.Error())
}
if err := killProxy(t.Pid, t.Source); err != nil {
errList = append(errList, err.Error())
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
return nil
}
// startProxy starts the proxy process inside the target container
// it is using nsenter command to enter into network namespace of target container
// and execute the proxy related command inside it.
func startProxy(experimentDetails *experimentTypes.ExperimentDetails, pid int) error {
toxics := os.Getenv("TOXIC_COMMAND")
// starting toxiproxy server inside the target container
startProxyServerCommand := fmt.Sprintf("(sudo nsenter -t %d -n toxiproxy-server -host=0.0.0.0 > /dev/null 2>&1 &)", pid)
// Creating a proxy for the targeted service in the target container
createProxyCommand := fmt.Sprintf("(sudo nsenter -t %d -n toxiproxy-cli create -l 0.0.0.0:%d -u 0.0.0.0:%d proxy)", pid, experimentDetails.ProxyPort, experimentDetails.TargetServicePort)
createToxicCommand := fmt.Sprintf("(sudo nsenter -t %d -n toxiproxy-cli toxic add %s --toxicity %f proxy)", pid, toxics, float32(experimentDetails.Toxicity)/100.0)
// sleep 2 is added for proxy-server to be ready for creating proxy and adding toxics
chaosCommand := fmt.Sprintf("%s && sleep 2 && %s && %s", startProxyServerCommand, createProxyCommand, createToxicCommand)
log.Infof("[Chaos]: Starting proxy server")
if err := common.RunBashCommand(chaosCommand, "failed to start proxy server", experimentDetails.ChaosPodName); err != nil {
return err
}
log.Info("[Info]: Proxy started successfully")
return nil
}
const NoProxyToKill = "you need to specify whom to kill"
// killProxy kills the proxy process inside the target container
// it is using nsenter command to enter into network namespace of target container
// and execute the proxy related command inside it.
func killProxy(pid int, source string) error {
stopProxyServerCommand := fmt.Sprintf("sudo nsenter -t %d -n sudo kill -9 $(ps aux | grep [t]oxiproxy | awk 'FNR==2{print $2}')", pid)
log.Infof("[Chaos]: Stopping proxy server")
if err := common.RunBashCommand(stopProxyServerCommand, "failed to stop proxy server", source); err != nil {
return err
}
log.Info("[Info]: Proxy stopped successfully")
return nil
}
// addIPRuleSet adds the ip rule set to iptables in target container
// it is using nsenter command to enter into network namespace of target container
// and execute the iptables related command inside it.
func addIPRuleSet(experimentDetails *experimentTypes.ExperimentDetails, pid int) error {
// it adds the proxy port REDIRECT iprule in the beginning of the PREROUTING table
// so that it always matches all the incoming packets for the matching target port filters and
// if matches then it redirect the request to the proxy port
addIPRuleSetCommand := fmt.Sprintf("(sudo nsenter -t %d -n iptables -t nat -I PREROUTING -i %v -p tcp --dport %d -j REDIRECT --to-port %d)", pid, experimentDetails.NetworkInterface, experimentDetails.TargetServicePort, experimentDetails.ProxyPort)
log.Infof("[Chaos]: Adding IPtables ruleset")
if err := common.RunBashCommand(addIPRuleSetCommand, "failed to add ip rules", experimentDetails.ChaosPodName); err != nil {
return err
}
log.Info("[Info]: IP rule set added successfully")
return nil
}
const NoIPRulesetToRemove = "No chain/target/match by that name"
// removeIPRuleSet removes the ip rule set from iptables in target container
// it is using nsenter command to enter into network namespace of target container
// and execute the iptables related command inside it.
func removeIPRuleSet(experimentDetails *experimentTypes.ExperimentDetails, pid int) error {
removeIPRuleSetCommand := fmt.Sprintf("sudo nsenter -t %d -n iptables -t nat -D PREROUTING -i %v -p tcp --dport %d -j REDIRECT --to-port %d", pid, experimentDetails.NetworkInterface, experimentDetails.TargetServicePort, experimentDetails.ProxyPort)
log.Infof("[Chaos]: Removing IPtables ruleset")
if err := common.RunBashCommand(removeIPRuleSetCommand, "failed to remove ip rules", experimentDetails.ChaosPodName); err != nil {
return err
}
log.Info("[Info]: IP rule set removed successfully")
return nil
}
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", ""))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
experimentDetails.NetworkInterface = types.Getenv("NETWORK_INTERFACE", "")
experimentDetails.TargetServicePort, _ = strconv.Atoi(types.Getenv("TARGET_SERVICE_PORT", ""))
experimentDetails.ProxyPort, _ = strconv.Atoi(types.Getenv("PROXY_PORT", ""))
experimentDetails.Toxicity, _ = strconv.Atoi(types.Getenv("TOXICITY", "100"))
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targets []targetDetails, resultName, chaosNS string, experimentDetails *experimentTypes.ExperimentDetails) {
<-abort
log.Info("[Abort]: Killing process started because of terminated signal received")
log.Info("[Abort]: Chaos Revert Started")
retry := 3
for retry > 0 {
for _, t := range targets {
if err = revertChaos(experimentDetails, t); err != nil {
if strings.Contains(err.Error(), NoIPRulesetToRemove) && strings.Contains(err.Error(), NoProxyToKill) {
continue
}
log.Errorf("unable to revert for %v pod, err :%v", t.Name, err)
continue
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult for %v pod, err :%v", t.Name, err)
}
}
retry--
time.Sleep(1 * time.Second)
}
log.Info("Chaos Revert Completed")
os.Exit(1)
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
ContainerId string
Pid int
Source string
}

View File

@ -1,37 +0,0 @@
package header
import (
"context"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
// PodHttpModifyHeaderChaos contains the steps to prepare and inject http modify header chaos
func PodHttpModifyHeaderChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHTTPModifyHeaderFault")
defer span.End()
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
"Listen Port": experimentsDetails.ProxyPort,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Toxicity": experimentsDetails.Toxicity,
"Headers": experimentsDetails.HeadersMap,
"Header Mode": experimentsDetails.HeaderMode,
})
stream := "downstream"
if experimentsDetails.HeaderMode == "request" {
stream = "upstream"
}
args := "-t header --" + stream + " -a headers='" + (experimentsDetails.HeadersMap) + "' -a mode=" + experimentsDetails.HeaderMode
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,266 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error {
var err error
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
//set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, args, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, args, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode inject the http chaos in all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, args string, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodHTTPFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform http chaos
for _, pod := range targetPodList.Items {
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode inject the http chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, args string, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodHTTPFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID, args string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateHTTPChaosHelperPod")
defer span.End()
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
Volumes: []apiv1.Volume{
{
Name: "cri-socket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
},
Args: []string{
"-c",
"./helpers -name http-chaos",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(ctx, experimentsDetails, targets, args),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
MountPath: experimentsDetails.SocketPath,
},
},
SecurityContext: &apiv1.SecurityContext{
Privileged: &privilegedEnable,
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"NET_ADMIN",
"SYS_ADMIN",
},
},
},
},
},
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets, args string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
SetEnv("CHAOS_UID", string(experimentsDetails.ChaosUID)).
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("TOXIC_COMMAND", args).
SetEnv("NETWORK_INTERFACE", experimentsDetails.NetworkInterface).
SetEnv("TARGET_SERVICE_PORT", strconv.Itoa(experimentsDetails.TargetServicePort)).
SetEnv("PROXY_PORT", strconv.Itoa(experimentsDetails.ProxyPort)).
SetEnv("TOXICITY", strconv.Itoa(experimentsDetails.Toxicity)).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
}
// SetChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -1,33 +0,0 @@
package latency
import (
"context"
"strconv"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
// PodHttpLatencyChaos contains the steps to prepare and inject http latency chaos
func PodHttpLatencyChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHttpLatencyFault")
defer span.End()
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
"Listen Port": experimentsDetails.ProxyPort,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Toxicity": experimentsDetails.Toxicity,
"Latency": experimentsDetails.Latency,
})
args := "-t latency -a latency=" + strconv.Itoa(experimentsDetails.Latency)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,50 +0,0 @@
package modifybody
import (
"context"
"fmt"
"math"
"strings"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
// PodHttpModifyBodyChaos contains the steps to prepare and inject http modify body chaos
func PodHttpModifyBodyChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHTTPModifyBodyFault")
defer span.End()
// responseBodyMaxLength defines the max length of response body string to be printed. It is taken as
// the min of length of body and 120 characters to avoid printing large response body.
responseBodyMaxLength := int(math.Min(float64(len(experimentsDetails.ResponseBody)), 120))
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
"Listen Port": experimentsDetails.ProxyPort,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Toxicity": experimentsDetails.Toxicity,
"ResponseBody": experimentsDetails.ResponseBody[0:responseBodyMaxLength],
"Content Type": experimentsDetails.ContentType,
"Content Encoding": experimentsDetails.ContentEncoding,
})
args := fmt.Sprintf(
`-t modify_body -a body="%v" -a content_type=%v -a content_encoding=%v`,
EscapeQuotes(experimentsDetails.ResponseBody), experimentsDetails.ContentType, experimentsDetails.ContentEncoding)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// EscapeQuotes escapes the quotes in the given string
func EscapeQuotes(input string) string {
output := strings.ReplaceAll(input, `\`, `\\`)
output = strings.ReplaceAll(output, `"`, `\"`)
return output
}

View File

@ -1,33 +0,0 @@
package reset
import (
"context"
"strconv"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
// PodHttpResetPeerChaos contains the steps to prepare and inject http reset peer chaos
func PodHttpResetPeerChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHTTPResetPeerFault")
defer span.End()
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
"Listen Port": experimentsDetails.ProxyPort,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Toxicity": experimentsDetails.Toxicity,
"Reset Timeout": experimentsDetails.ResetTimeout,
})
args := "-t reset_peer -a timeout=" + strconv.Itoa(experimentsDetails.ResetTimeout)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,118 +0,0 @@
package statuscode
import (
"context"
"fmt"
"math"
"math/rand"
"strconv"
"strings"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"go.opentelemetry.io/otel"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
body "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib/modify-body"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
)
var acceptedStatusCodes = []string{
"200", "201", "202", "204",
"300", "301", "302", "304", "307",
"400", "401", "403", "404",
"500", "501", "502", "503", "504",
}
// PodHttpStatusCodeChaos contains the steps to prepare and inject http status code chaos
func PodHttpStatusCodeChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHttpStatusCodeFault")
defer span.End()
// responseBodyMaxLength defines the max length of response body string to be printed. It is taken as
// the min of length of body and 120 characters to avoid printing large response body.
responseBodyMaxLength := int(math.Min(float64(len(experimentsDetails.ResponseBody)), 120))
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
"Listen Port": experimentsDetails.ProxyPort,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Toxicity": experimentsDetails.Toxicity,
"StatusCode": experimentsDetails.StatusCode,
"ModifyResponseBody": experimentsDetails.ModifyResponseBody,
"ResponseBody": experimentsDetails.ResponseBody[0:responseBodyMaxLength],
"Content Type": experimentsDetails.ContentType,
"Content Encoding": experimentsDetails.ContentEncoding,
})
args := fmt.Sprintf(
`-t status_code -a status_code=%s -a modify_response_body=%d -a response_body="%v" -a content_type=%s -a content_encoding=%s`,
experimentsDetails.StatusCode, stringBoolToInt(experimentsDetails.ModifyResponseBody), body.EscapeQuotes(experimentsDetails.ResponseBody),
experimentsDetails.ContentType, experimentsDetails.ContentEncoding)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// GetStatusCode performs two functions:
// 1. It checks if the status code is provided or not. If it's not then it selects a random status code from supported list
// 2. It checks if the provided status code is valid or not.
func GetStatusCode(statusCode string) (string, error) {
if statusCode == "" {
log.Info("[Info]: No status code provided. Selecting a status code randomly from supported status codes")
return acceptedStatusCodes[rand.Intn(len(acceptedStatusCodes))], nil
}
statusCodeList := strings.Split(statusCode, ",")
rand.Seed(time.Now().Unix())
if len(statusCodeList) == 1 {
if checkStatusCode(statusCodeList[0], acceptedStatusCodes) {
return statusCodeList[0], nil
}
} else {
acceptedCodes := getAcceptedCodesInList(statusCodeList, acceptedStatusCodes)
if len(acceptedCodes) == 0 {
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("invalid status code: %s", statusCode)}
}
return acceptedCodes[rand.Intn(len(acceptedCodes))], nil
}
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("status code '%s' is not supported. Supported status codes are: %v", statusCode, acceptedStatusCodes)}
}
// getAcceptedCodesInList returns the list of accepted status codes from a list of status codes
func getAcceptedCodesInList(statusCodeList []string, acceptedStatusCodes []string) []string {
var acceptedCodes []string
for _, statusCode := range statusCodeList {
if checkStatusCode(statusCode, acceptedStatusCodes) {
acceptedCodes = append(acceptedCodes, statusCode)
}
}
return acceptedCodes
}
// checkStatusCode checks if the provided status code is present in acceptedStatusCode list
func checkStatusCode(statusCode string, acceptedStatusCodes []string) bool {
for _, code := range acceptedStatusCodes {
if code == statusCode {
return true
}
}
return false
}
// stringBoolToInt will convert boolean string to int
func stringBoolToInt(b string) int {
parsedBool, err := strconv.ParseBool(b)
if err != nil {
return 0
}
if parsedBool {
return 1
}
return 0
}

View File

@ -1,165 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/load/k6-loadgen/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
corev1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectK6LoadGenFault")
defer span.End()
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
runID := stringutils.GetRunID()
// creating the helper pod to perform k6-loadgen chaos
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// PrepareChaos contains the preparation steps before chaos injection
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareK6LoadGenFault")
defer span.End()
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Starting the k6-loadgen experiment
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not execute chaos")
}
// Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateK6LoadGenFaultHelperPod")
defer span.End()
const volumeName = "script-volume"
const mountPath = "/mnt"
var envs []corev1.EnvVar
args := []string{
mountPath + "/" + experimentsDetails.ScriptSecretKey,
"-q",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--tag",
"trace_id=" + span.SpanContext().TraceID().String(),
}
if otelExporterEndpoint := os.Getenv(telemetry.OTELExporterOTLPEndpoint); otelExporterEndpoint != "" {
envs = []corev1.EnvVar{
{
Name: "K6_OTEL_METRIC_PREFIX",
Value: experimentsDetails.OTELMetricPrefix,
},
{
Name: "K6_OTEL_GRPC_EXPORTER_INSECURE",
Value: "true",
},
{
Name: "K6_OTEL_GRPC_EXPORTER_ENDPOINT",
Value: otelExporterEndpoint,
},
}
args = append(args, "--out", "experimental-opentelemetry")
}
helperPod := &corev1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: corev1.PodSpec{
RestartPolicy: corev1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
Containers: []corev1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: corev1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"k6",
"run",
},
Args: args,
Env: envs,
Resources: chaosDetails.Resources,
VolumeMounts: []corev1.VolumeMount{
{
Name: volumeName,
MountPath: mountPath,
},
},
},
},
Volumes: []corev1.Volume{
{
Name: volumeName,
VolumeSource: corev1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
SecretName: experimentsDetails.ScriptSecretName,
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}

View File

@ -1,251 +0,0 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/workloads"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kafka/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/sirupsen/logrus"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareKafkaPodDeleteFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.ChaoslibDetail.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.ChaoslibDetail.RampTime)
common.WaitForDuration(experimentsDetails.ChaoslibDetail.RampTime)
}
switch strings.ToLower(experimentsDetails.ChaoslibDetail.Sequence) {
case "serial":
if err := injectChaosInSerialMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.ChaoslibDetail.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.ChaoslibDetail.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.ChaoslibDetail.RampTime)
common.WaitForDuration(experimentsDetails.ChaoslibDetail.RampTime)
}
return nil
}
// injectChaosInSerialMode delete the kafka broker pods in serial mode(one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectKafkaPodDeleteFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
GracePeriod := int64(0)
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaoslibDetail.ChaosDuration {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.KafkaBroker == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or KAFKA_BROKER"}
}
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.ChaoslibDetail.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.KafkaBroker, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
// deriving the parent name of the target resources
for _, pod := range targetPodList.Items {
kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return err
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
}
if experimentsDetails.ChaoslibDetail.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Deleting the application pod
for _, pod := range targetPodList.Items {
log.InfoWithValues("[Info]: Killing the following pods", logrus.Fields{
"PodName": pod.Name})
if experimentsDetails.ChaoslibDetail.Force {
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
}
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
}
switch chaosDetails.Randomness {
case true:
if err := common.RandomInterval(experimentsDetails.ChaoslibDetail.ChaosInterval); err != nil {
return stacktrace.Propagate(err, "could not get random chaos interval")
}
default:
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaoslibDetail.ChaosInterval != "" {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaoslibDetail.ChaosInterval)
waitTime, _ := strconv.Atoi(experimentsDetails.ChaoslibDetail.ChaosInterval)
common.WaitForDuration(waitTime)
}
}
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
for _, parent := range chaosDetails.ParentsResources {
target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.ChaoslibDetail.Timeout, experimentsDetails.ChaoslibDetail.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
log.Infof("[Completion]: %v chaos is done", experimentsDetails.ExperimentName)
return nil
}
// injectChaosInParallelMode delete the kafka broker pods in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectKafkaPodDeleteFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
GracePeriod := int64(0)
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaoslibDetail.ChaosDuration {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.KafkaBroker == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or KAFKA_BROKER"}
}
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.ChaoslibDetail.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.KafkaBroker, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
// deriving the parent name of the target resources
for _, pod := range targetPodList.Items {
kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return stacktrace.Propagate(err, "could not get pod owner name and kind")
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
}
if experimentsDetails.ChaoslibDetail.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Deleting the application pod
for _, pod := range targetPodList.Items {
log.InfoWithValues("[Info]: Killing the following pods", logrus.Fields{
"PodName": pod.Name})
if experimentsDetails.ChaoslibDetail.Force {
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
}
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
}
}
switch chaosDetails.Randomness {
case true:
if err := common.RandomInterval(experimentsDetails.ChaoslibDetail.ChaosInterval); err != nil {
return stacktrace.Propagate(err, "could not get random chaos interval")
}
default:
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaoslibDetail.ChaosInterval != "" {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaoslibDetail.ChaosInterval)
waitTime, _ := strconv.Atoi(experimentsDetails.ChaoslibDetail.ChaosInterval)
common.WaitForDuration(waitTime)
}
}
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
for _, parent := range chaosDetails.ParentsResources {
target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.ChaoslibDetail.Timeout, experimentsDetails.ChaoslibDetail.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
log.Infof("[Completion]: %v chaos is done", experimentsDetails.ExperimentName)
return nil
}

View File

@ -1,47 +1,40 @@
package lib
import (
"context"
"fmt"
"strconv"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/kubelet-service-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareKubeletKill contains prepration steps before chaos injection
func PrepareKubeletKill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareKubeletServiceKillFault")
defer span.End()
func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
var err error
if experimentsDetails.TargetNode == "" {
if experimentsDetails.AppNode == "" {
//Select node for kubelet-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
appNodeName, err := common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get node name")
return errors.Errorf("Unable to get the application nodename, err: %v", err)
}
experimentsDetails.AppNode = appNodeName
}
log.InfoWithValues("[Info]: Details of node under chaos injection", logrus.Fields{
"NodeName": experimentsDetails.TargetNode,
"NodeName": experimentsDetails.AppNode,
})
experimentsDetails.RunID = stringutils.GetRunID()
experimentsDetails.RunID = common.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -50,40 +43,57 @@ func PrepareKubeletKill(ctx context.Context, experimentsDetails *experimentTypes
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + experimentsDetails.TargetNode + " node"
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + experimentsDetails.AppNode + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// Creating the helper pod to perform node memory hog
if err = createHelperPod(ctx, experimentsDetails, clients, chaosDetails, experimentsDetails.TargetNode); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
err = CreateHelperPod(experimentsDetails, clients, experimentsDetails.AppNode)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
if err := common.CheckHelperStatusAndRunProbes(ctx, appLabel, experimentsDetails.TargetNode, chaosDetails, clients, resultDetails, eventsDetails); err != nil {
return err
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "name="+experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Checking for the node to be in not-ready state
log.Info("[Status]: Check for the node to be in NotReady state")
if err = status.CheckNodeNotReadyState(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
if deleteErr := common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients); deleteErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[err: %v, delete error: %v]", err, deleteErr)}
}
return stacktrace.Propagate(err, "could not check for NOT READY state")
err = status.CheckNodeNotReadyState(experimentsDetails.AppNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("application node is not in NotReady state, err: %v", err)
}
if err := common.WaitForCompletionAndDeleteHelperPods(appLabel, chaosDetails, clients, false); err != nil {
return err
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", experimentsDetails.ChaosDuration+30)
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, "name="+experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, clients, experimentsDetails.ChaosDuration+30, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
return errors.Errorf("helper pod failed, err: %v", err)
}
// Checking the status of application node
log.Info("[Status]: Getting the status of application node")
err = status.CheckNodeStatus(experimentsDetails.AppNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("application node is not in ready state, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, "name="+experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
//Waiting for the ramp time after chaos injection
@ -91,30 +101,27 @@ func PrepareKubeletKill(ctx context.Context, experimentsDetails *experimentTypes
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateKubeletServiceKillFaultHelperPod")
defer span.End()
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appNodeName string) error {
privileged := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
Name: experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName,
"name": experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "bus",
@ -137,7 +144,7 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
ImagePullPolicy: apiv1.PullAlways,
Command: []string{
"/bin/bash",
},
@ -145,7 +152,6 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
"-c",
"sleep 10 && systemctl stop kubelet && sleep " + strconv.Itoa(experimentsDetails.ChaosDuration) + " && systemctl start kubelet",
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "bus",
@ -162,35 +168,9 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
TTY: true,
},
},
Tolerations: []apiv1.Toleration{
{
Key: "node.kubernetes.io/not-ready",
Operator: apiv1.TolerationOperator("Exists"),
Effect: apiv1.TaintEffect("NoExecute"),
TolerationSeconds: ptrint64(int64(experimentsDetails.ChaosDuration) + 60),
},
{
Key: "node.kubernetes.io/unreachable",
Operator: apiv1.TolerationOperator("Exists"),
Effect: apiv1.TaintEffect("NoExecute"),
TolerationSeconds: ptrint64(int64(experimentsDetails.ChaosDuration) + 60),
},
},
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
func ptrint64(p int64) *int64 {
return &p
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}

View File

@ -1,7 +1,7 @@
package helper
package main
import (
"context"
"encoding/json"
"fmt"
"os"
"os/exec"
@ -11,413 +11,326 @@ import (
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/chaoslib/litmus/network_latency/tc"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentEnv "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/environment"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientTypes "k8s.io/apimachinery/pkg/types"
)
const (
qdiscNotFound = "Cannot delete qdisc with handle of zero"
qdiscNoFileFound = "RTNETLINK answers: No such file or directory"
)
var err error
var (
err error
inject, abort chan os.Signal
sPorts, dPorts, whitelistDPorts, whitelistSPorts []string
)
// Helper injects the network chaos
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodNetworkFault")
defer span.End()
func main() {
experimentsDetails := experimentTypes.ExperimentDetails{}
clients := clients.ClientSets{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
resultDetails := types.ResultDetails{}
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Getting kubeConfig and Generate ClientSets
if err := clients.GenerateClientSetFromKubeConfig(); err != nil {
log.Fatalf("Unable to Get the kubeconfig, err: %v", err)
}
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Fetching all the ENV passed for the helper pod
//Fetching all the ENV passed for the runner pod
log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails)
GetENV(&experimentsDetails)
// Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise the chaos attributes
experimentEnv.InitialiseChaosVariables(&chaosDetails, &experimentsDetails)
// Initialise Chaos Result Parameters
// Intialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
err := preparePodNetworkChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails)
err := PreparePodNetworkChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails)
if err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
// preparePodNetworkChaos contains the prepration steps before chaos injection
func preparePodNetworkChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
targetEnv := os.Getenv("TARGETS")
if targetEnv == "" {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: "no target found, provide atleast one target"}
}
var targets []targetDetails
for _, t := range strings.Split(targetEnv, ";") {
target := strings.Split(t, ":")
if len(target) != 4 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: fmt.Sprintf("unsupported target format: '%v'", targets)}
}
td := targetDetails{
Name: target[0],
Namespace: target[1],
TargetContainer: target[2],
DestinationIps: getDestIps(target[3]),
Source: chaosDetails.ChaosPodName,
}
td.ContainerId, err = common.GetRuntimeBasedContainerID(experimentsDetails.ContainerRuntime, experimentsDetails.SocketPath, td.Name, td.Namespace, td.TargetContainer, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the network ns path of the pod sandbox or pause container
td.NetworkNsPath, err = common.GetNetworkNsPath(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container network ns path")
}
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos
go abortWatcher(targets, experimentsDetails.NetworkInterface, resultDetails.Name, chaosDetails.ChaosNamespace)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
for index, t := range targets {
// injecting network chaos inside target container
if err = injectChaos(experimentsDetails.NetworkInterface, t); err != nil {
if revertErr := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, index-1); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not inject chaos")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, index); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
//PreparePodNetworkChaos contains the prepration steps before chaos injection
func PreparePodNetworkChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
// extract out the pid of the target container
targetPID, err := GetPID(experimentsDetails, clients)
if err != nil {
return err
}
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injected " + experimentsDetails.ExperimentName + " chaos on application pods"
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
// injecting network chaos inside target container
if err = InjectChaos(experimentsDetails, targetPID); err != nil {
return err
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
log.Info("[Chaos]: Duration is over, reverting chaos")
loop:
for {
endTime = time.After(timeDelay)
select {
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
// updating the chaosresult after stopped
failStep := "Network Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
if err := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
// generating summary event in chaosengine
msg := experimentsDetails.ExperimentName + " experiment has been aborted"
types.SetEngineEventAttributes(eventsDetails, types.Summary, msg, "Warning", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
return nil
}
// generating summary event in chaosresult
types.SetResultEventAttributes(eventsDetails, types.StoppedVerdict, msg, "Warning", resultDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosResult")
if err = tc.Killnetem(targetPID); err != nil {
log.Errorf("unable to kill netem process, err :%v", err)
func revertChaosForAllTargets(targets []targetDetails, networkInterface string, resultDetails *types.ResultDetails, chaosNs string, index int) error {
var errList []string
for i := 0; i <= index; i++ {
killed, err := killnetem(targets[i], networkInterface)
if !killed && err != nil {
errList = append(errList, err.Error())
continue
}
if killed && err == nil {
if err = result.AnnotateChaosResult(resultDetails.Name, chaosNs, "reverted", "pod", targets[i].Name); err != nil {
errList = append(errList, err.Error())
}
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
log.Info("[Chaos]: Stopping the experiment")
// cleaning the netem process after chaos injection
if err = tc.Killnetem(targetPID); err != nil {
return err
}
return nil
}
// injectChaos inject the network chaos in target container
//GetPID extract out the pid of target container
func GetPID(experimentDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (int, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentDetails.AppNS).Get(experimentDetails.TargetPod, v1.GetOptions{})
if err != nil {
return 0, err
}
var containerID string
// filtering out the container id from the details of containers inside containerStatuses of the given pod
// container id is present in the form of <runtime>://<container-id>
for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentDetails.TargetContainer {
containerID = strings.Split(container.ContainerID, "//")[1]
break
}
}
log.Infof("containerid: %v", containerID)
// deriving pid from the inspect out of target container
out, err := exec.Command("crictl", "inspect", containerID).CombinedOutput()
if err != nil {
log.Error(fmt.Sprintf("[cri]: Failed to run crictl: %s", string(out)))
return 0, err
}
// parsing data from the json output of inspect command
PID, err := parsePIDFromJSON(out, experimentDetails.ContainerRuntime)
if err != nil {
log.Error(fmt.Sprintf("[cri]: Failed to parse json from crictl output: %s", string(out)))
return 0, err
}
log.Info(fmt.Sprintf("[cri]: Container ID=%s has process PID=%d", containerID, PID))
return PID, nil
}
// InspectResponse JSON representation of crictl inspect command output
// in crio, pid is present inside pid attribute of inspect output
// in containerd, pid is present inside `info.pid` of inspect output
type InspectResponse struct {
Info InfoDetails `json:"info"`
}
// InfoDetails JSON representation of crictl inspect command output
// in crio, pid is present inside pid attribute of inspect output
// in containerd, pid is present inside `info.pid` of inspect output
type InfoDetails struct {
PID int `json:"pid"`
}
//parsePIDFromJSON extract the pid from the json output
func parsePIDFromJSON(j []byte, runtime string) (int, error) {
var pid int
// in crio, pid is present inside pid attribute of inspect output
// in containerd, pid is present inside `info.pid` of inspect output
if runtime == "containerd" {
var resp InspectResponse
if err := json.Unmarshal(j, &resp); err != nil {
return 0, err
}
pid = resp.Info.PID
} else if runtime == "crio" {
var resp InfoDetails
if err := json.Unmarshal(j, &resp); err != nil {
return 0, errors.Errorf("[cri]: Could not find pid field in json: %s", string(j))
}
pid = resp.PID
} else {
return 0, errors.Errorf("[cri]: No supported container runtime, runtime: %v", runtime)
}
if pid == 0 {
return 0, errors.Errorf("[cri]: No running target container found, pid: %v", string(pid))
}
return pid, nil
}
// InjectChaos inject the network chaos in target container
// it is using nsenter command to enter into network namespace of target container
// and execute the netem command inside it.
func injectChaos(netInterface string, target targetDetails) error {
func InjectChaos(experimentDetails *experimentTypes.ExperimentDetails, pid int) error {
netemCommands := os.Getenv("NETEM_COMMAND")
targetIPs := os.Getenv("TARGET_IPs")
if len(target.DestinationIps) == 0 && len(sPorts) == 0 && len(dPorts) == 0 && len(whitelistDPorts) == 0 && len(whitelistSPorts) == 0 {
tc := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %s root %v", target.NetworkNsPath, netInterface, netemCommands)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create tc rules", target.Source); err != nil {
if targetIPs == "" {
tc := fmt.Sprintf("nsenter -t %d -n tc qdisc add dev %s root netem %v", pid, experimentDetails.NetworkInterface, netemCommands)
cmd := exec.Command("/bin/bash", "-c", tc)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
} else {
ips := strings.Split(targetIPs, ",")
var uniqueIps []string
// removing duplicates ips from the list, if any
for i := range ips {
isPresent := false
for j := range uniqueIps {
if ips[i] == uniqueIps[j] {
isPresent = true
}
}
if !isPresent {
uniqueIps = append(uniqueIps, ips[i])
}
}
// Create a priority-based queue
// This instantly creates classes 1:1, 1:2, 1:3
priority := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %v root handle 1: prio", target.NetworkNsPath, netInterface)
log.Info(priority)
if err := common.RunBashCommand(priority, "failed to create priority-based queue", target.Source); err != nil {
priority := fmt.Sprintf("nsenter -t %v -n tc qdisc add dev %v root handle 1: prio", pid, experimentDetails.NetworkInterface)
cmd := exec.Command("/bin/bash", "-c", priority)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
// Add queueing discipline for 1:3 class.
// No traffic is going through 1:3 yet
traffic := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %v parent 1:3 %v", target.NetworkNsPath, netInterface, netemCommands)
log.Info(traffic)
if err := common.RunBashCommand(traffic, "failed to create netem queueing discipline", target.Source); err != nil {
traffic := fmt.Sprintf("nsenter -t %v -n tc qdisc add dev %v parent 1:3 netem %v", pid, experimentDetails.NetworkInterface, netemCommands)
cmd = exec.Command("/bin/bash", "-c", traffic)
out, err = cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
if len(whitelistDPorts) != 0 || len(whitelistSPorts) != 0 {
for _, port := range whitelistDPorts {
//redirect traffic to specific dport through band 2
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 2 u32 match ip dport %v 0xffff flowid 1:2", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create whitelist dport match filters", target.Source); err != nil {
return err
}
}
for _, ip := range uniqueIps {
for _, port := range whitelistSPorts {
//redirect traffic to specific sport through band 2
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 2 u32 match ip sport %v 0xffff flowid 1:2", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create whitelist sport match filters", target.Source); err != nil {
return err
}
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dst 0.0.0.0/0 flowid 1:3", target.NetworkNsPath, netInterface)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create rule for all ports match filters", target.Source); err != nil {
return err
}
} else {
for i := range target.DestinationIps {
var (
ip = target.DestinationIps[i]
ports []string
isIPV6 = strings.Contains(target.DestinationIps[i], ":")
)
// extracting the destination ports from the ips
// ip format is ip(|port1|port2....|portx)
if strings.Contains(target.DestinationIps[i], "|") {
ip = strings.Split(target.DestinationIps[i], "|")[0]
ports = strings.Split(target.DestinationIps[i], "|")[1:]
}
// redirect traffic to specific IP through band 3
filter := fmt.Sprintf("match ip dst %v", ip)
if isIPV6 {
filter = fmt.Sprintf("match ip6 dst %v", ip)
}
if len(ports) != 0 {
for _, port := range ports {
portFilter := fmt.Sprintf("%s match ip dport %v 0xffff", filter, port)
if isIPV6 {
portFilter = fmt.Sprintf("%s match ip6 dport %v 0xffff", filter, port)
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 %s flowid 1:3", target.NetworkNsPath, netInterface, portFilter)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ips match filters", target.Source); err != nil {
return err
}
}
continue
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 %s flowid 1:3", target.NetworkNsPath, netInterface, filter)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ips match filters", target.Source); err != nil {
return err
}
}
for _, port := range sPorts {
//redirect traffic to specific sport through band 3
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip sport %v 0xffff flowid 1:3", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create source ports match filters", target.Source); err != nil {
return err
}
}
for _, port := range dPorts {
//redirect traffic to specific dport through band 3
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dport %v 0xffff flowid 1:3", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ports match filters", target.Source); err != nil {
// redirect traffic to specific IP through band 3
// It allows ipv4 addresses only
if !strings.Contains(ip, ":") {
tc := fmt.Sprintf("nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dst %v flowid 1:3", pid, experimentDetails.NetworkInterface, ip)
cmd = exec.Command("/bin/bash", "-c", tc)
out, err = cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
}
}
}
log.Infof("chaos injected successfully on {pod: %v, container: %v}", target.Name, target.TargetContainer)
return nil
}
// killnetem kill the netem process for all the target containers
func killnetem(target targetDetails, networkInterface string) (bool, error) {
tc := fmt.Sprintf("sudo nsenter --net=%s tc qdisc delete dev %s root", target.NetworkNsPath, networkInterface)
// Killnetem kill the netem process for all the target containers
func Killnetem(PID int) error {
tc := fmt.Sprintf("nsenter -t %d -n tc qdisc delete dev eth0 root", PID)
cmd := exec.Command("/bin/bash", "-c", tc)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Info(cmd.String())
// ignoring err if qdisc process doesn't exist inside the target container
if strings.Contains(string(out), qdiscNotFound) || strings.Contains(string(out), qdiscNoFileFound) {
log.Warn("The network chaos process has already been removed")
return true, err
}
log.Error(err.Error())
return false, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: target.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", target.Name, target.Namespace, target.TargetContainer), Reason: fmt.Sprintf("failed to revert network faults: %s", string(out))}
log.Error(string(out))
return err
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", target.Name, target.Namespace, target.TargetContainer)
return true, nil
return nil
}
type targetDetails struct {
Name string
Namespace string
ServiceMesh string
DestinationIps []string
TargetContainer string
ContainerId string
Source string
NetworkNsPath string
//GetENV fetches all the env variables from the runner pod
func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = Getenv("EXPERIMENT_NAME", "")
experimentDetails.AppNS = Getenv("APP_NS", "")
experimentDetails.TargetContainer = Getenv("APP_CONTAINER", "")
experimentDetails.TargetPod = Getenv("APP_POD", "")
experimentDetails.AppLabel = Getenv("APP_LABEL", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosNamespace = Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = Getenv("CHAOS_ENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = Getenv("POD_NAME", "")
experimentDetails.ContainerRuntime = Getenv("CONTAINER_RUNTIME", "")
experimentDetails.NetworkInterface = Getenv("NETWORK_INTERFACE", "eth0")
experimentDetails.TargetIPs = Getenv("TARGET_IPs", "")
}
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", ""))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.NetworkInterface = types.Getenv("NETWORK_INTERFACE", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
experimentDetails.DestinationIPs = types.Getenv("DESTINATION_IPS", "")
experimentDetails.SourcePorts = types.Getenv("SOURCE_PORTS", "")
experimentDetails.DestinationPorts = types.Getenv("DESTINATION_PORTS", "")
if strings.TrimSpace(experimentDetails.DestinationPorts) != "" {
if strings.Contains(experimentDetails.DestinationPorts, "!") {
whitelistDPorts = strings.Split(strings.TrimPrefix(strings.TrimSpace(experimentDetails.DestinationPorts), "!"), ",")
} else {
dPorts = strings.Split(strings.TrimSpace(experimentDetails.DestinationPorts), ",")
}
}
if strings.TrimSpace(experimentDetails.SourcePorts) != "" {
if strings.Contains(experimentDetails.SourcePorts, "!") {
whitelistSPorts = strings.Split(strings.TrimPrefix(strings.TrimSpace(experimentDetails.SourcePorts), "!"), ",")
} else {
sPorts = strings.Split(strings.TrimSpace(experimentDetails.SourcePorts), ",")
}
// Getenv fetch the env and set the default value, if any
func Getenv(key string, defaultValue string) string {
value := os.Getenv(key)
if value == "" {
value = defaultValue
}
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targets []targetDetails, networkInterface, resultName, chaosNS string) {
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
for _, t := range targets {
killed, err := killnetem(t, networkInterface)
if err != nil && !killed {
log.Errorf("unable to kill netem process, err :%v", err)
continue
}
if killed && err == nil {
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
}
}
retry--
time.Sleep(1 * time.Second)
}
log.Info("Chaos Revert Completed")
os.Exit(1)
}
func getDestIps(serviceMesh string) []string {
var (
destIps = os.Getenv("DESTINATION_IPS")
uniqueIps []string
)
if serviceMesh == "true" {
destIps = os.Getenv("DESTINATION_IPS_SERVICE_MESH")
}
if strings.TrimSpace(destIps) == "" {
return nil
}
ips := strings.Split(strings.TrimSpace(destIps), ",")
// removing duplicates ips from the list, if any
for i := range ips {
if !common.Contains(ips[i], uniqueIps) {
uniqueIps = append(uniqueIps, ips[i])
}
}
return uniqueIps
return value
}

View File

@ -1,26 +1,24 @@
package corruption
import (
"context"
"fmt"
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
// PodNetworkCorruptionChaos contains the steps to prepare and inject chaos
func PodNetworkCorruptionChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkCorruptionFault")
defer span.End()
var err error
args := "netem corrupt " + experimentsDetails.NetworkPacketCorruptionPercentage
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
//PodNetworkCorruptionChaos contains the steps to prepare and inject chaos
func PodNetworkCorruptionChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args := "corrupt " + strconv.Itoa(experimentsDetails.NetworkPacketCorruptionPercentage)
err = network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
if err != nil {
return err
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
return nil
}

View File

@ -1,26 +1,24 @@
package duplication
import (
"context"
"fmt"
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
// PodNetworkDuplicationChaos contains the steps to prepare and inject chaos
func PodNetworkDuplicationChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkDuplicationFault")
defer span.End()
var err error
args := "netem duplicate " + experimentsDetails.NetworkPacketDuplicationPercentage
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
//PodNetworkDuplicationChaos contains the steps to prepare and inject chaos
func PodNetworkDuplicationChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args := "duplicate " + strconv.Itoa(experimentsDetails.NetworkPacketDuplicationPercentage)
err = network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
if err != nil {
return err
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
return nil
}

View File

@ -1,27 +1,24 @@
package latency
import (
"context"
"fmt"
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
// PodNetworkLatencyChaos contains the steps to prepare and inject chaos
func PodNetworkLatencyChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkLatencyFault")
defer span.End()
var err error
args := "netem delay " + strconv.Itoa(experimentsDetails.NetworkLatency) + "ms " + strconv.Itoa(experimentsDetails.Jitter) + "ms"
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
//PodNetworkLatencyChaos contains the steps to prepare and inject chaos
func PodNetworkLatencyChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args := "delay " + strconv.Itoa(experimentsDetails.NetworkLatency) + "ms"
err = network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
if err != nil {
return err
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
return nil
}

View File

@ -1,26 +1,24 @@
package loss
import (
"context"
"fmt"
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
// PodNetworkLossChaos contains the steps to prepare and inject chaos
func PodNetworkLossChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkLossFault")
defer span.End()
var err error
args := "netem loss " + experimentsDetails.NetworkPacketLossPercentage
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
//PodNetworkLossChaos contains the steps to prepare and inject chaos
func PodNetworkLossChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args := "loss " + strconv.Itoa(experimentsDetails.NetworkPacketLossPercentage)
err = network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
if err != nil {
return err
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
return nil
}

View File

@ -1,51 +1,31 @@
package lib
import (
"context"
"fmt"
"net"
"os"
"strconv"
"strings"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
k8serrors "k8s.io/apimachinery/pkg/api/errors"
"github.com/litmuschaos/litmus-go/pkg/clients"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
"github.com/pkg/errors"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var serviceMesh = []string{"istio", "envoy"}
var destIpsSvcMesh string
var destIps string
var err error
// PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error {
//PrepareAndInjectChaos contains the prepration & injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error {
var err error
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
//set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
logExperimentFields(experimentsDetails)
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
targetPodList, err := common.GetPodList(experimentsDetails.AppNS, experimentsDetails.TargetPod, experimentsDetails.AppLabel, experimentsDetails.PodsAffectedPerc, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
return errors.Errorf("Unable to get the target pod list, err: %v", err)
}
//Waiting for the ramp time before chaos injection
@ -56,143 +36,105 @@ func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTy
// Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
err = GetServiceAccount(experimentsDetails, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get experiment service account")
return errors.Errorf("Unable to get the serviceAccountName, err: %v", err)
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = GetTargetContainer(experimentsDetails, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("Unable to get the target container name, err: %v", err)
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode inject the network chaos in all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodNetworkFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
serviceMesh, err := setDestIps(pod, experimentsDetails, clients)
runID := common.GetRunID()
err = CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID, args)
if err != nil {
return stacktrace.Propagate(err, "could not set destination ips")
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "app="+experimentsDetails.ExperimentName+"-helper", experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
runID := stringutils.GetRunID()
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, "app="+experimentsDetails.ExperimentName+"-helper", clients, experimentsDetails.ChaosDuration+60, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
return errors.Errorf("helper pod failed due to, err: %v", err)
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer, serviceMesh), pod.Spec.NodeName, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pod")
err = common.DeleteAllPod("app="+experimentsDetails.ExperimentName+"-helper", experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pods, err: %v", err)
}
return nil
}
// injectChaosInParallelMode inject the network chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodNetworkFaultInParallelMode")
defer span.End()
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
targets, err := filterPodsForNodes(targetPodList, experimentsDetails, clients)
// GetServiceAccount find the serviceAccountName for the helper pod
func GetServiceAccount(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Get(experimentsDetails.ChaosPodName, v1.GetOptions{})
if err != nil {
return stacktrace.Propagate(err, "could not filter target pods")
}
runID := stringutils.GetRunID()
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s:%s", k.Name, k.Namespace, k.TargetContainer, k.ServiceMesh))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
experimentsDetails.ChaosServiceAccount = pod.Spec.ServiceAccountName
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets string, nodeName, runID, args string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreatePodNetworkFaultHelperPod")
defer span.End()
//GetTargetContainer will fetch the container name from application pod
//This container will be used as target container
func GetTargetContainer(experimentsDetails *experimentTypes.ExperimentDetails, appName string, clients clients.ClientSets) (string, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(appName, v1.GetOptions{})
if err != nil {
return "", err
}
var (
privilegedEnable = true
terminationGracePeriodSeconds = int64(experimentsDetails.TerminationGracePeriodSeconds)
helperName = fmt.Sprintf("%s-helper-%s", experimentsDetails.ExperimentName, stringutils.GetRunID())
)
return pod.Spec.Containers[0].Name, nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, podName, nodeName, runID string, args string) error {
privilegedEnable := false
if experimentsDetails.ContainerRuntime == "crio" {
privilegedEnable = true
}
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: helperName,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
Name: experimentsDetails.ExperimentName + "-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper",
"name": experimentsDetails.ExperimentName + "-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
Tolerations: chaosDetails.Tolerations,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
HostPID: true,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
Volumes: []apiv1.Volume{
{
Name: "cri-socket",
@ -202,27 +144,38 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
},
},
},
{
Name: "cri-config",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/etc/crictl.yaml",
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
ImagePullPolicy: apiv1.PullAlways,
Command: []string{
"/bin/bash",
},
Args: []string{
"-c",
"./helpers -name network-chaos",
"./experiments/network-chaos",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(ctx, experimentsDetails, targets, args),
Env: GetPodEnv(experimentsDetails, podName, args),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
MountPath: experimentsDetails.SocketPath,
},
{
Name: "cri-config",
MountPath: "/etc/crictl.yaml",
},
},
SecurityContext: &apiv1.SecurityContext{
Privileged: &privilegedEnable,
@ -238,310 +191,85 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
// mount the network ns path for crio runtime
// it is required to access the sandbox network ns
if strings.ToLower(experimentsDetails.ContainerRuntime) == "crio" {
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, apiv1.Volume{
Name: "netns-path",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/var/run/netns",
},
},
})
helperPod.Spec.Containers[0].VolumeMounts = append(helperPod.Spec.Containers[0].VolumeMounts, apiv1.VolumeMount{
Name: "netns-path",
MountPath: "/var/run/netns",
})
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string, args string) []apiv1.EnvVar {
// GetPodEnv derive all the env required for the helper pod
func GetPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName, args string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
SetEnv("CHAOS_UID", string(experimentsDetails.ChaosUID)).
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("NETEM_COMMAND", args).
SetEnv("NETWORK_INTERFACE", experimentsDetails.NetworkInterface).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("DESTINATION_IPS", destIps).
SetEnv("DESTINATION_IPS_SERVICE_MESH", destIpsSvcMesh).
SetEnv("SOURCE_PORTS", experimentsDetails.SourcePorts).
SetEnv("DESTINATION_PORTS", experimentsDetails.DestinationPorts).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
}
type targetsDetails struct {
Target []target
}
type target struct {
Namespace string
Name string
TargetContainer string
ServiceMesh string
}
// GetTargetIps return the comma separated target ips
// It fetches the ips from the target ips (if defined by users)
// it appends the ips from the host, if target host is provided
func GetTargetIps(targetIPs, targetHosts string, clients clients.ClientSets, serviceMesh bool) (string, error) {
ipsFromHost, err := getIpsForTargetHosts(targetHosts, clients, serviceMesh)
if err != nil {
return "", stacktrace.Propagate(err, "could not get ips from target hosts")
var envVar []apiv1.EnvVar
ENVList := map[string]string{
"APP_NS": experimentsDetails.AppNS,
"APP_POD": podName,
"APP_CONTAINER": experimentsDetails.TargetContainer,
"TOTAL_CHAOS_DURATION": strconv.Itoa(experimentsDetails.ChaosDuration),
"CHAOS_NAMESPACE": experimentsDetails.ChaosNamespace,
"CHAOS_ENGINE": experimentsDetails.EngineName,
"CHAOS_UID": string(experimentsDetails.ChaosUID),
"CONTAINER_RUNTIME": experimentsDetails.ContainerRuntime,
"NETEM_COMMAND": args,
"NETWORK_INTERFACE": experimentsDetails.NetworkInterface,
"EXPERIMENT_NAME": experimentsDetails.ExperimentName,
"TARGET_IPs": GetTargetIpsArgs(experimentsDetails.TargetIPs, experimentsDetails.TargetHosts),
}
if targetIPs == "" {
targetIPs = ipsFromHost
} else if ipsFromHost != "" {
for key, value := range ENVList {
var perEnv apiv1.EnvVar
perEnv.Name = key
perEnv.Value = value
envVar = append(envVar, perEnv)
}
// Getting experiment pod name from downward API
experimentPodName := GetValueFromDownwardAPI("v1", "metadata.name")
var downwardEnv apiv1.EnvVar
downwardEnv.Name = "POD_NAME"
downwardEnv.ValueFrom = &experimentPodName
envVar = append(envVar, downwardEnv)
return envVar
}
// GetValueFromDownwardAPI returns the value from downwardApi
func GetValueFromDownwardAPI(apiVersion string, fieldPath string) apiv1.EnvVarSource {
downwardENV := apiv1.EnvVarSource{
FieldRef: &apiv1.ObjectFieldSelector{
APIVersion: apiVersion,
FieldPath: fieldPath,
},
}
return downwardENV
}
// GetTargetIpsArgs return the comma separated target ips
// It fetch the ips from the target ips (if defined by users)
// it append the ips from the host, if target host is provided
func GetTargetIpsArgs(targetIPs, targetHosts string) string {
ipsFromHost := GetIpsForTargetHosts(targetHosts)
if ipsFromHost != "" {
targetIPs = targetIPs + "," + ipsFromHost
}
return targetIPs, nil
return targetIPs
}
// it derives the pod ips from the kubernetes service
func getPodIPFromService(host string, clients clients.ClientSets) ([]string, error) {
var ips []string
svcFields := strings.Split(host, ".")
if len(svcFields) != 5 {
return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{host: %s}", host), Reason: "provide the valid FQDN for service in '<svc-name>.<namespace>.svc.cluster.local format"}
}
svcName, svcNs := svcFields[0], svcFields[1]
svc, err := clients.GetService(svcNs, svcName)
if err != nil {
if k8serrors.IsForbidden(err) {
log.Warnf("forbidden - failed to get %v service in %v namespace, err: %v", svcName, svcNs, err)
return ips, nil
}
return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{serviceName: %s, namespace: %s}", svcName, svcNs), Reason: err.Error()}
}
if svc.Spec.Selector == nil {
return nil, nil
}
var svcSelector string
for k, v := range svc.Spec.Selector {
if svcSelector == "" {
svcSelector += fmt.Sprintf("%s=%s", k, v)
continue
}
svcSelector += fmt.Sprintf(",%s=%s", k, v)
}
pods, err := clients.ListPods(svcNs, svcSelector)
if err != nil {
return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{svcName: %s,podLabel: %s, namespace: %s}", svcNs, svcSelector, svcNs), Reason: fmt.Sprintf("failed to derive pods from service: %s", err.Error())}
}
for _, p := range pods.Items {
if p.Status.PodIP == "" {
continue
}
ips = append(ips, p.Status.PodIP)
}
return ips, nil
}
// getIpsForTargetHosts resolves IP addresses for comma-separated list of target hosts and returns comma-separated ips
func getIpsForTargetHosts(targetHosts string, clients clients.ClientSets, serviceMesh bool) (string, error) {
// GetIpsForTargetHosts resolves IP addresses for comma-separated list of target hosts and returns comma-separated ips
func GetIpsForTargetHosts(targetHosts string) string {
if targetHosts == "" {
return "", nil
return ""
}
hosts := strings.Split(targetHosts, ",")
finalHosts := ""
var commaSeparatedIPs []string
for i := range hosts {
hosts[i] = strings.TrimSpace(hosts[i])
var (
hostName = hosts[i]
ports []string
)
if strings.Contains(hosts[i], "|") {
host := strings.Split(hosts[i], "|")
hostName = host[0]
ports = host[1:]
log.Infof("host and port: %v :%v", hostName, ports)
}
if strings.Contains(hostName, "svc.cluster.local") && serviceMesh {
ips, err := getPodIPFromService(hostName, clients)
if err != nil {
return "", stacktrace.Propagate(err, "could not get pod ips from service")
}
log.Infof("Host: {%v}, IP address: {%v}", hosts[i], ips)
if ports != nil {
for j := range ips {
commaSeparatedIPs = append(commaSeparatedIPs, ips[j]+"|"+strings.Join(ports, "|"))
}
} else {
commaSeparatedIPs = append(commaSeparatedIPs, ips...)
}
if finalHosts == "" {
finalHosts = hosts[i]
} else {
finalHosts = finalHosts + "," + hosts[i]
}
continue
}
ips, err := net.LookupIP(hostName)
ips, err := net.LookupIP(hosts[i])
if err != nil {
log.Warnf("Unknown host: {%v}, it won't be included in the scope of chaos", hostName)
log.Infof("Unknown host")
} else {
for j := range ips {
log.Infof("Host: {%v}, IP address: {%v}", hostName, ips[j])
if ports != nil {
commaSeparatedIPs = append(commaSeparatedIPs, ips[j].String()+"|"+strings.Join(ports, "|"))
continue
}
log.Infof("IP address: %v", ips[j])
commaSeparatedIPs = append(commaSeparatedIPs, ips[j].String())
}
if finalHosts == "" {
finalHosts = hosts[i]
} else {
finalHosts = finalHosts + "," + hosts[i]
}
}
}
if len(commaSeparatedIPs) == 0 {
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("hosts: %s", targetHosts), Reason: "provided hosts are invalid, unable to resolve"}
}
log.Infof("Injecting chaos on {%v} hosts", finalHosts)
return strings.Join(commaSeparatedIPs, ","), nil
}
// SetChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.NetworkPacketLossPercentage = common.ValidateRange(experimentsDetails.NetworkPacketLossPercentage)
experimentsDetails.NetworkPacketCorruptionPercentage = common.ValidateRange(experimentsDetails.NetworkPacketCorruptionPercentage)
experimentsDetails.NetworkPacketDuplicationPercentage = common.ValidateRange(experimentsDetails.NetworkPacketDuplicationPercentage)
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}
// It checks if pod contains service mesh sidecar
func isServiceMeshEnabledForPod(pod apiv1.Pod) bool {
for _, c := range pod.Spec.Containers {
if common.SubStringExistsInSlice(c.Name, serviceMesh) {
return true
}
}
return false
}
func setDestIps(pod apiv1.Pod, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (string, error) {
var err error
if isServiceMeshEnabledForPod(pod) {
if destIpsSvcMesh == "" {
destIpsSvcMesh, err = GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients, true)
if err != nil {
return "false", err
}
}
return "true", nil
}
if destIps == "" {
destIps, err = GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients, false)
if err != nil {
return "false", err
}
}
return "false", nil
}
func filterPodsForNodes(targetPodList apiv1.PodList, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (map[string]*targetsDetails, error) {
targets := make(map[string]*targetsDetails)
targetContainer := experimentsDetails.TargetContainer
for _, pod := range targetPodList.Items {
serviceMesh, err := setDestIps(pod, experimentsDetails, clients)
if err != nil {
return targets, stacktrace.Propagate(err, "could not set destination ips")
}
if experimentsDetails.TargetContainer == "" {
targetContainer = pod.Spec.Containers[0].Name
}
td := target{
Name: pod.Name,
Namespace: pod.Namespace,
TargetContainer: targetContainer,
ServiceMesh: serviceMesh,
}
if targets[pod.Spec.NodeName] == nil {
targets[pod.Spec.NodeName] = &targetsDetails{
Target: []target{td},
}
} else {
targets[pod.Spec.NodeName].Target = append(targets[pod.Spec.NodeName].Target, td)
}
}
return targets, nil
}
func logExperimentFields(experimentsDetails *experimentTypes.ExperimentDetails) {
switch experimentsDetails.NetworkChaosType {
case "network-loss":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketLossPercentage": experimentsDetails.NetworkPacketLossPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-latency":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkLatency": strconv.Itoa(experimentsDetails.NetworkLatency),
"Jitter": experimentsDetails.Jitter,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-corruption":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketCorruptionPercentage": experimentsDetails.NetworkPacketCorruptionPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-duplication":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketDuplicationPercentage": experimentsDetails.NetworkPacketDuplicationPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-rate-limit":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkBandwidth": experimentsDetails.NetworkBandwidth,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
}
return strings.Join(commaSeparatedIPs, ",")
}

View File

@ -1,29 +0,0 @@
package rate
import (
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
// PodNetworkRateChaos contains the steps to prepare and inject chaos
func PodNetworkRateChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkRateLimit")
defer span.End()
args := fmt.Sprintf("tbf rate %s burst %s limit %s", experimentsDetails.NetworkBandwidth, experimentsDetails.Burst, experimentsDetails.Limit)
if experimentsDetails.PeakRate != "" {
args = fmt.Sprintf("%s peakrate %s", args, experimentsDetails.PeakRate)
}
if experimentsDetails.MinBurst != "" {
args = fmt.Sprintf("%s mtu %s", args, experimentsDetails.MinBurst)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -0,0 +1,76 @@
package cri
import (
"encoding/json"
"fmt"
"os/exec"
"strings"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/pkg/errors"
coreV1 "k8s.io/api/core/v1"
)
// PIDFromContainer extract out the pids from the target containers
func PIDFromContainer(c coreV1.ContainerStatus) (int, error) {
containerID := strings.Split(c.ContainerID, "//")[1]
out, err := exec.Command("crictl", "inspect", containerID).CombinedOutput()
if err != nil {
log.Error(fmt.Sprintf("[cri] Failed to run crictl: %s", string(out)))
return 0, err
}
runtime := strings.Split(c.ContainerID, "://")[0]
PID, _ := parsePIDFromJSON(out, runtime)
if err != nil {
log.Error(fmt.Sprintf("[cri] Failed to parse json from crictl output: %s", string(out)))
return 0, err
}
log.Info(fmt.Sprintf("[cri] Container ID=%s has process PID=%d", containerID, PID))
return PID, nil
}
// InspectResponse JSON representation of crictl inspect command output
// in crio, pid is present inside pid attribute of inspect output
// in containerd, pid is present inside `info.pid` of inspect output
type InspectResponse struct {
Info InfoDetails `json:"info"`
}
// InfoDetails JSON representation of crictl inspect command output
// in crio, pid is present inside pid attribute of inspect output
// in containerd, pid is present inside `info.pid` of inspect output
type InfoDetails struct {
PID int `json:"pid"`
}
//parsePIDFromJSON extract the pid from the json output
func parsePIDFromJSON(j []byte, runtime string) (int, error) {
var pid int
// in crio, pid is present inside pid attribute of inspect output
// in containerd, pid is present inside `info.pid` of inspect output
if runtime == "containerd" {
var resp InspectResponse
if err := json.Unmarshal(j, &resp); err != nil {
return 0, err
}
pid = resp.Info.PID
} else if runtime == "crio" {
var resp InfoDetails
if err := json.Unmarshal(j, &resp); err != nil {
return 0, errors.Errorf("[cri] Could not find pid field in json: %s", string(j))
}
pid = resp.PID
} else {
return 0, errors.Errorf("no supported container runtime, runtime: %v", runtime)
}
if pid == 0 {
return 0, errors.Errorf("[cri] no running target container found, pid: %v", string(pid))
}
return pid, nil
}

View File

@ -0,0 +1,78 @@
package cri
import "testing"
func TestCrictl(t *testing.T) {
// crictl inspect output
json := `
{
"status": {
"id": "c739a31ab698e6e1c679442a538d16cc7199703c80f030e159b5de6b46e60518",
"metadata": {
"attempt": 0,
"name": "nginx-unprivileged"
},
"state": "CONTAINER_RUNNING",
"createdAt": "2020-07-28T16:50:35.84027013Z",
"startedAt": "2020-07-28T16:50:35.996159402Z",
"finishedAt": "1970-01-01T00:00:00Z",
"exitCode": 0,
"image": {
"image": "docker.io/nginxinc/nginx-unprivileged:latest"
},
"imageRef": "docker.io/nginxinc/nginx-unprivileged@sha256:0fd19475c17fff38191ef0dd3d1b949a25fd637cd64756146cc99363e580cf3a",
"reason": "",
"message": "",
"labels": {
"io.kubernetes.container.name": "nginx-unprivileged",
"io.kubernetes.pod.name": "app-7f99cf5459-gdqw7",
"io.kubernetes.pod.namespace": "myteam",
"io.kubernetes.pod.uid": "d2368c41-679f-40a8-aa5d-6a763876ef06"
},
"annotations": {
"io.kubernetes.container.hash": "ddf9b623",
"io.kubernetes.container.restartCount": "0",
"io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
"io.kubernetes.container.terminationMessagePolicy": "File",
"io.kubernetes.pod.terminationGracePeriod": "30"
},
"mounts": [
{
"containerPath": "/etc/hosts",
"hostPath": "/var/lib/kubelet/pods/d2368c41-679f-40a8-aa5d-6a763876ef06/etc-hosts",
"propagation": "PROPAGATION_PRIVATE",
"readonly": false,
"selinuxRelabel": false
},
{
"containerPath": "/dev/termination-log",
"hostPath": "/var/lib/kubelet/pods/d2368c41-679f-40a8-aa5d-6a763876ef06/containers/nginx-unprivileged/00467287",
"propagation": "PROPAGATION_PRIVATE",
"readonly": false,
"selinuxRelabel": false
},
{
"containerPath": "/var/run/secrets/kubernetes.io/serviceaccount",
"hostPath": "/var/lib/kubelet/pods/d2368c41-679f-40a8-aa5d-6a763876ef06/volumes/kubernetes.io~secret/default-token-8lf4k",
"propagation": "PROPAGATION_PRIVATE",
"readonly": true,
"selinuxRelabel": false
}
],
"logPath": "/var/log/pods/myteam_app-7f99cf5459-gdqw7_d2368c41-679f-40a8-aa5d-6a763876ef06/nginx-unprivileged/0.log"
},
"pid": 72496,
"sandboxId": "e978d37294a29c4a7f3f668f44f33431d4b9b892e415fcddfcdf71a8d047a2f7"
}`
expectedPID := 72496
PID, err := parsePIDFromJSON([]byte(json), "crio")
if err != nil {
t.Fatalf("Fail to parse json: %s", err)
}
if PID != expectedPID {
t.Errorf("Fail to parse PID from json. Expected %d, got %d", expectedPID, PID)
}
}

View File

@ -0,0 +1,66 @@
package ip
import (
"encoding/json"
"fmt"
"os/exec"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/pkg/errors"
)
// InterfaceName returns the name of the ethernet interface of the given
// process (container). It returns an error if case none, or more than one,
// interface is present.
func InterfaceName(PID int) (string, error) {
ip := fmt.Sprintf("nsenter -t %d -n ip -json link list", PID)
cmd := exec.Command("/bin/bash", "-c", ip)
out, err := cmd.CombinedOutput()
log.Info(fmt.Sprintf("[ip] %s", cmd))
if err != nil {
log.Error(fmt.Sprintf("[ip] Failed to run ip command: %s", string(out)))
return "", err
}
links, err := parseLinksResponse(out)
if err != nil {
log.Errorf("[ip] Failed to parse json response from ip command", err)
return "", err
}
ls := []Link{}
for _, iface := range links {
if iface.Type != "loopback" {
ls = append(ls, iface)
}
}
log.Info(fmt.Sprintf("[ip] Found %d link interface(s): %+v", len(ls), ls))
if len(ls) > 1 {
errors.Errorf("[ip] Unexpected number of link interfaces for process %d. Expected 1 ethernet link, found %d",
PID, len(ls))
}
return ls[0].Name, nil
}
type LinkListResponse struct {
Links []Link
}
type Link struct {
Name string `json:"ifname"`
Type string `json:"link_type"`
Qdisc string `json:"qdisc"`
NSID int `json:"link_netnsid"`
}
func parseLinksResponse(j []byte) ([]Link, error) {
var links []Link
err := json.Unmarshal(j, &links)
if err != nil {
return nil, err
}
return links, nil
}

View File

@ -0,0 +1,58 @@
package ip
import "testing"
func TestIpLinkList(t *testing.T) {
json := `
[
{
"ifindex":1,
"ifname":"lo",
"flags":[
"LOOPBACK",
"UP",
"LOWER_UP"
],
"mtu":65536,
"qdisc":"noqueue",
"operstate":"UNKNOWN",
"linkmode":"DEFAULT",
"group":"default",
"txqlen":1000,
"link_type":"loopback",
"address":"00:00:00:00:00:00",
"broadcast":"00:00:00:00:00:00"
},
{
"ifindex":3,
"link_index":27,
"ifname":"eth0",
"flags": [
"BROADCAST",
"MULTICAST",
"UP",
"LOWER_UP"
],
"mtu":1450,
"qdisc":"noqueue",
"operstate":"UP",
"linkmode":"DEFAULT",
"group":"default",
"link_type":"ether",
"address":"0a:58:0a:80:00:0f",
"broadcast":"ff:ff:ff:ff:ff:ff",
"link_netnsid":0
}
]
`
links, err := parseLinksResponse([]byte(json))
if err != nil {
t.Fatalf("Failed to parse ip link json: %s", err)
}
expected := 2
got := len(links)
if got != expected {
t.Errorf("Failed to parse ip link json. Expected %d, got %d: %v", expected, got, links)
}
}

View File

@ -0,0 +1,150 @@
package network_latency
import (
"fmt"
. "fmt"
"os"
"os/signal"
"strconv"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/chaoslib/litmus/network_latency/cri"
"github.com/litmuschaos/litmus-go/chaoslib/litmus/network_latency/tc"
"github.com/litmuschaos/litmus-go/pkg/clients"
env "github.com/litmuschaos/litmus-go/pkg/generic/network-latency/environment"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-latency/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareNetwork function orchestrates the experiment
func PrepareNetwork(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails) error {
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
log.Info("[Chaos Target]: Fetching dependency info")
deps, err := env.Dependencies()
if err != nil {
Println(err.Error())
return err
}
log.Info("[Chaos Target]: Resolving latency target IPs")
conf, err := env.Resolver(deps)
if err != nil {
Println(err.Error())
return err
}
log.Info("[Chaos Target]: Finding the container PID")
targetPIDs, err := ChaosTargetPID(experimentsDetails.ChaosNode, experimentsDetails.AppNS, experimentsDetails.AppLabel, clients)
if err != nil {
Println(err.Error())
return err
}
for _, targetPID := range targetPIDs {
log.Info(fmt.Sprintf("[Chaos]: Apply latency to process PID=%d", targetPID))
err := tc.CreateDelayQdisc(targetPID, experimentsDetails.Latency, experimentsDetails.Jitter)
if err != nil {
log.Error("Failed to create delay, aborting experiment")
return err
}
for i, ip := range conf.IP {
port := conf.Port[i]
err = tc.AddIPFilter(targetPID, ip, port)
if err != nil {
Println(err.Error())
return err
}
}
}
log.Infof("[Chaos]: Waiting for %vs", strconv.Itoa(experimentsDetails.ChaosDuration))
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
loop:
for {
endTime = time.After(timeDelay)
select {
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
for _, targetPID := range targetPIDs {
err = tc.Killnetem(targetPID)
if err != nil {
Println(err.Error())
}
}
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
log.Info("[Chaos]: Stopping the experiment")
for _, targetPID := range targetPIDs {
err = tc.Killnetem(targetPID)
if err != nil {
Println(err.Error())
}
}
if err != nil {
Println(err.Error())
return err
}
return nil
}
//ChaosTargetPID finds the target app PIDs
func ChaosTargetPID(chaosNode string, appNs string, appLabel string, clients clients.ClientSets) ([]int, error) {
podList, err := clients.KubeClient.CoreV1().Pods(appNs).List(metav1.ListOptions{
LabelSelector: appLabel,
FieldSelector: "spec.nodeName=" + chaosNode,
})
if err != nil {
return []int{}, err
}
if len(podList.Items) == 0 {
return []int{}, errors.Errorf("No pods with label %s were found in the namespace %s", appLabel, appNs)
}
PIDs := []int{}
for _, pod := range podList.Items {
if len(pod.Status.ContainerStatuses) == 0 {
return []int{}, errors.Errorf("Unreachable: No containers running in this pod: %+v", pod)
}
// containers in a pod share the network namespace, so anyone should be
// fine for our purposes
container := pod.Status.ContainerStatuses[0]
log.InfoWithValues("Found target container", logrus.Fields{
"container": container.Name,
"Pod": pod.Name,
"Status": pod.Status.Phase,
"containerID": container.ContainerID,
})
PID, err := cri.PIDFromContainer(container)
if err != nil {
return []int{}, err
}
PIDs = append(PIDs, PID)
}
log.Info(Sprintf("Found %d target process(es)", len(PIDs)))
return PIDs, nil
}

View File

@ -0,0 +1,107 @@
package tc
import (
"errors"
"fmt"
"net"
"os/exec"
"github.com/litmuschaos/litmus-go/chaoslib/litmus/network_latency/ip"
"github.com/litmuschaos/litmus-go/pkg/log"
)
func CreateDelayQdisc(PID int, latency float64, jitter float64) error {
if PID == 0 {
log.Error(fmt.Sprintf("[tc] Invalid PID=%d", PID))
return errors.New("Target PID cannot be zero")
}
if latency <= 0 {
log.Error(fmt.Sprintf("[tc] Invalid latency=%f", latency))
return errors.New("Latency should be a positive value")
}
iface, err := ip.InterfaceName(PID)
if err != nil {
return err
}
log.Info(fmt.Sprintf("[tc] CreateDelayQdisc: PID=%d interface=%s latency=%fs jitter=%fs", PID, iface, latency, jitter))
tc := fmt.Sprintf("nsenter -t %d -n tc qdisc add dev %s root handle 1: prio", PID, iface)
cmd := exec.Command("/bin/bash", "-c", tc)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
if almostZero(jitter) {
// no jitter
tc = fmt.Sprintf("nsenter -t %d -n tc qdisc add dev %s parent 1:3 netem delay %fs", PID, iface, latency)
} else {
tc = fmt.Sprintf("nsenter -t %d -n tc qdisc add dev %s parent 1:3 netem delay %fs %fs", PID, iface, latency, jitter)
}
cmd = exec.Command("/bin/bash", "-c", tc)
out, err = cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
return nil
}
func AddIPFilter(PID int, IP net.IP, port int) error {
if PID == 0 {
log.Error(fmt.Sprintf("[tc] Invalid PID=%d", PID))
return errors.New("Target PID cannot be zero")
}
if port == 0 {
log.Error(fmt.Sprintf("[tc] Invalid Port=%d", port))
return errors.New("Port cannot be zero")
}
log.Info(fmt.Sprintf("[tc] AddIPFilter: Target PID=%d, destination IP=%s, destination Port=%d", PID, IP, port))
tc := fmt.Sprintf("nsenter -t %d -n tc filter add dev eth0 protocol ip parent 1:0 prio 3 u32 match ip dst %s match ip dport %d 0xffff flowid 1:3", PID, IP, port)
cmd := exec.Command("/bin/bash", "-c", tc)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
return nil
}
func Killnetem(PID int) error {
if PID == 0 {
log.Error(fmt.Sprintf("[tc] Invalid PID=%d", PID))
return errors.New("Target PID cannot be zero")
}
tc := fmt.Sprintf("nsenter -t %d -n tc qdisc delete dev eth0 root", PID)
cmd := exec.Command("/bin/bash", "-c", tc)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
return nil
}
// float is complicated ¯\_(ツ)_/¯ it can't be compared to exact numbers due to
// variations in precision
func almostZero(f float64) bool {
return f < 0.0000001
}

View File

@ -1,80 +1,102 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-cpu-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareNodeCPUHog contains preparation steps before chaos injection
func PrepareNodeCPUHog(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeCPUHogFault")
defer span.End()
var err error
//set up the tunables if provided in range
setChaosTunables(experimentsDetails)
// PrepareNodeCPUHog contains prepration steps before chaos injection
func PrepareNodeCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Node CPU Cores": experimentsDetails.NodeCPUcores,
"CPU Load": experimentsDetails.CPULoad,
"Node Affected Percentage": experimentsDetails.NodesAffectedPerc,
"Sequence": experimentsDetails.Sequence,
if experimentsDetails.AppNode == "" {
//Select node for kubelet-service-kill
appNodeName, err := common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, clients)
if err != nil {
return errors.Errorf("Unable to get the application nodename, err: %v", err)
}
experimentsDetails.AppNode = appNodeName
}
// When number of cpu cores for hogging is not defined , it will take it from node capacity
if experimentsDetails.NodeCPUcores == 0 {
err = SetCPUCapacity(experimentsDetails, clients)
if err != nil {
return err
}
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"NodeName": experimentsDetails.AppNode,
"NodeCPUcores": experimentsDetails.NodeCPUcores,
})
experimentsDetails.RunID = common.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Select node for node-cpu-hog
nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc)
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get node list")
}
log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{
"No. Of Nodes": len(targetNodeList),
"Node Names": targetNodeList,
})
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + experimentsDetails.AppNode + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// Creating the helper pod to perform node cpu hog
err = CreateHelperPod(experimentsDetails, clients)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "name="+experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", experimentsDetails.ChaosDuration+30)
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, "name="+experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, clients, experimentsDetails.ChaosDuration+30, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
return errors.Errorf("helper pod failed due to, err: %v", err)
}
// Checking the status of application node
log.Info("[Status]: Getting the status of application node")
err = status.CheckNodeStatus(experimentsDetails.AppNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
log.Warn("Application node is not in the ready state, you may need to manually recover the node")
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, "name="+experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
//Waiting for the ramp time after chaos injection
@ -85,194 +107,56 @@ func PrepareNodeCPUHog(ctx context.Context, experimentsDetails *experimentTypes.
return nil
}
// injectChaosInSerialMode stress the cpu of all the target nodes serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeCPUHogFaultInSerialMode")
defer span.End()
nodeCPUCores := experimentsDetails.NodeCPUcores
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + appNode + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// When number of cpu cores for hogging is not defined , it will take it from node capacity
if nodeCPUCores == "0" {
if err := setCPUCapacity(experimentsDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not get node cpu capacity")
}
}
log.InfoWithValues("[Info]: Details of Node under chaos injection", logrus.Fields{
"NodeName": appNode,
"NodeCPUCores": experimentsDetails.NodeCPUcores,
})
experimentsDetails.RunID = stringutils.GetRunID()
// Creating the helper pod to perform node cpu hog
if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
common.SetTargets(appNode, "targeted", "node", chaosDetails)
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, false)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
}
}
return nil
}
// injectChaosInParallelMode stress the cpu of all the target nodes in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeCPUHogFaultInParallelMode")
defer span.End()
nodeCPUCores := experimentsDetails.NodeCPUcores
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
experimentsDetails.RunID = stringutils.GetRunID()
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + appNode + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// When number of cpu cores for hogging is not defined , it will take it from node capacity
if nodeCPUCores == "0" {
if err := setCPUCapacity(experimentsDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not get node cpu capacity")
}
}
log.InfoWithValues("[Info]: Details of Node under chaos injection", logrus.Fields{
"NodeName": appNode,
"NodeCPUcores": experimentsDetails.NodeCPUcores,
})
// Creating the helper pod to perform node cpu hog
if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
//SetCPUCapacity fetch the node cpu capacity
func SetCPUCapacity(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
node, err := clients.KubeClient.CoreV1().Nodes().Get(experimentsDetails.AppNode, v1.GetOptions{})
if err != nil {
return err
}
cpuCapacity, _ := node.Status.Capacity.Cpu().AsInt64()
experimentsDetails.NodeCPUcores = int(cpuCapacity)
return nil
}
// setCPUCapacity fetch the node cpu capacity
func setCPUCapacity(experimentsDetails *experimentTypes.ExperimentDetails, appNode string, clients clients.ClientSets) error {
node, err := clients.GetNode(appNode, experimentsDetails.Timeout, experimentsDetails.Delay)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNode), Reason: err.Error()}
}
experimentsDetails.NodeCPUcores = node.Status.Capacity.Cpu().String()
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeCPUHogFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
Name: experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName,
"name": experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNode,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: experimentsDetails.AppNode,
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
ImagePullPolicy: apiv1.PullAlways,
Command: []string{
"stress-ng",
},
Args: []string{
"--cpu",
experimentsDetails.NodeCPUcores,
"--cpu-load",
experimentsDetails.CPULoad,
strconv.Itoa(experimentsDetails.NodeCPUcores),
"--timeout",
strconv.Itoa(experimentsDetails.ChaosDuration),
},
Resources: chaosDetails.Resources,
},
},
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// setChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.NodeCPUcores = common.ValidateRange(experimentsDetails.NodeCPUcores)
experimentsDetails.CPULoad = common.ValidateRange(experimentsDetails.CPULoad)
experimentsDetails.NodesAffectedPerc = common.ValidateRange(experimentsDetails.NodesAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}

View File

@ -1,53 +1,31 @@
package lib
import (
"context"
"bytes"
"fmt"
"os"
"os/exec"
"os/signal"
"strconv"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-drain/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"github.com/pkg/errors"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var (
err error
inject, abort chan os.Signal
)
var err error
// PrepareNodeDrain contains the preparation steps before chaos injection
func PrepareNodeDrain(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeDrainFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//PrepareNodeDrain contains the prepration steps before chaos injection
func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -55,70 +33,90 @@ func PrepareNodeDrain(ctx context.Context, experimentsDetails *experimentTypes.E
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.TargetNode == "" {
if experimentsDetails.AppNode == "" {
//Select node for kubelet-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
appNodeName, err := common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get node name")
return errors.Errorf("Unable to get the application nodename, err: %v", err)
}
experimentsDetails.AppNode = appNodeName
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + experimentsDetails.TargetNode + " node"
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + experimentsDetails.AppNode + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, clients, resultDetails, chaosDetails, eventsDetails)
// Drain the application node
if err := drainNode(ctx, experimentsDetails, clients, chaosDetails); err != nil {
log.Info("[Revert]: Reverting chaos because error during draining of node")
if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(uncordonErr).Error())}
}
return stacktrace.Propagate(err, "could not drain node")
err := DrainNode(experimentsDetails, clients)
if err != nil {
return err
}
// Verify the status of AUT after reschedule
log.Info("[Status]: Verify the status of AUT after reschedule")
if err = status.AUTStatusCheck(clients, chaosDetails); err != nil {
log.Info("[Revert]: Reverting chaos because application status check failed")
if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(uncordonErr).Error())}
}
return err
err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("Application status check failed, err: %v", err)
}
// Verify the status of Auxiliary Applications after reschedule
if experimentsDetails.AuxiliaryAppInfo != "" {
log.Info("[Status]: Verify that the Auxiliary Applications are running")
if err = status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
log.Info("[Revert]: Reverting chaos because auxiliary application status check failed")
if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(uncordonErr).Error())}
}
return err
err = status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("Auxiliary Applications status check failed, err: %v", err)
}
}
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
log.Info("[Chaos]: Stopping the experiment")
loop:
for {
endTime = time.After(timeDelay)
select {
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
// updating the chaosresult after stopped
failStep := "Node Drain injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
// generating summary event in chaosengine
msg := experimentsDetails.ExperimentName + " experiment has been aborted"
types.SetEngineEventAttributes(eventsDetails, types.Summary, msg, "Warning", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
// generating summary event in chaosresult
types.SetResultEventAttributes(eventsDetails, types.StoppedVerdict, msg, "Warning", resultDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosResult")
if err = UncordonNode(experimentsDetails, clients); err != nil {
log.Errorf("unable to uncordon node, err :%v", err)
}
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
// Uncordon the application node
if err := uncordonNode(experimentsDetails, clients, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not uncordon the target node")
err = UncordonNode(experimentsDetails, clients)
if err != nil {
return err
}
//Waiting for the ramp time after chaos injection
@ -129,106 +127,64 @@ func PrepareNodeDrain(ctx context.Context, experimentsDetails *experimentTypes.E
return nil
}
// drainNode drain the target node
func drainNode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeDrainFault")
defer span.End()
// DrainNode drain the application node
func DrainNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
log.Infof("[Inject]: Draining the %v node", experimentsDetails.TargetNode)
log.Infof("[Inject]: Draining the %v node", experimentsDetails.AppNode)
command := exec.Command("kubectl", "drain", experimentsDetails.TargetNode, "--ignore-daemonsets", "--delete-emptydir-data", "--force", "--timeout", strconv.Itoa(experimentsDetails.ChaosDuration)+"s")
if err := common.RunCLICommands(command, "", fmt.Sprintf("{node: %s}", experimentsDetails.TargetNode), "failed to drain the target node", cerrors.ErrorTypeChaosInject); err != nil {
return err
}
common.SetTargets(experimentsDetails.TargetNode, "injected", "node", chaosDetails)
return retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
nodeSpec, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), experimentsDetails.TargetNode, v1.GetOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{node: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
}
if !nodeSpec.Spec.Unschedulable {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{node: %s}", experimentsDetails.TargetNode), Reason: "node is not in unschedule state"}
}
return nil
})
}
return nil
}
// uncordonNode uncordon the application node
func uncordonNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
targetNodes := strings.Split(experimentsDetails.TargetNode, ",")
for _, targetNode := range targetNodes {
//Check node exist before uncordon the node
_, err := clients.GetNode(targetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil {
if apierrors.IsNotFound(err) {
log.Infof("[Info]: The %v node is no longer exist, skip uncordon the node", targetNode)
common.SetTargets(targetNode, "noLongerExist", "node", chaosDetails)
continue
} else {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{node: %s}", targetNode), Reason: err.Error()}
}
}
log.Infof("[Recover]: Uncordon the %v node", targetNode)
command := exec.Command("kubectl", "uncordon", targetNode)
if err := common.RunCLICommands(command, "", fmt.Sprintf("{node: %s}", targetNode), "failed to uncordon the target node", cerrors.ErrorTypeChaosInject); err != nil {
return err
}
common.SetTargets(targetNode, "reverted", "node", chaosDetails)
command := exec.Command("kubectl", "drain", experimentsDetails.AppNode, "--ignore-daemonsets", "--delete-local-data", "--force")
var out, stderr bytes.Buffer
command.Stdout = &out
command.Stderr = &stderr
if err := command.Run(); err != nil {
log.Infof("Error String: %v", stderr.String())
return fmt.Errorf("Unable to drain the %v node, err: %v", experimentsDetails.AppNode, err)
}
return retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
err = retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
targetNodes := strings.Split(experimentsDetails.TargetNode, ",")
for _, targetNode := range targetNodes {
nodeSpec, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), targetNode, v1.GetOptions{})
if err != nil {
if apierrors.IsNotFound(err) {
continue
} else {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{node: %s}", targetNode), Reason: err.Error()}
}
}
if nodeSpec.Spec.Unschedulable {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{node: %s}", targetNode), Reason: "target node is in unschedule state"}
}
nodeSpec, err := clients.KubeClient.CoreV1().Nodes().Get(experimentsDetails.AppNode, v1.GetOptions{})
if err != nil {
return err
}
if !nodeSpec.Spec.Unschedulable {
return errors.Errorf("%v node is not in unschedulable state", experimentsDetails.AppNode)
}
return nil
})
return nil
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails) {
// waiting till the abort signal received
<-abort
// UncordonNode uncordon the application node
func UncordonNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
if err := uncordonNode(experimentsDetails, clients, chaosDetails); err != nil {
log.Errorf("Unable to uncordon the node, err: %v", err)
}
retry--
time.Sleep(1 * time.Second)
log.Infof("[Recover]: Uncordon the %v node", experimentsDetails.AppNode)
command := exec.Command("kubectl", "uncordon", experimentsDetails.AppNode)
var out, stderr bytes.Buffer
command.Stdout = &out
command.Stderr = &stderr
if err := command.Run(); err != nil {
log.Infof("Error String: %v", stderr.String())
return fmt.Errorf("Unable to uncordon the %v node, err: %v", experimentsDetails.AppNode, err)
}
log.Info("Chaos Revert Completed")
os.Exit(0)
err = retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
nodeSpec, err := clients.KubeClient.CoreV1().Nodes().Get(experimentsDetails.AppNode, v1.GetOptions{})
if err != nil {
return err
}
if nodeSpec.Spec.Unschedulable {
return errors.Errorf("%v node is in unschedulable state", experimentsDetails.AppNode)
}
return nil
})
return nil
}

View File

@ -1,80 +1,94 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-io-stress/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareNodeIOStress contains preparation steps before chaos injection
func PrepareNodeIOStress(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeIOStressFault")
defer span.End()
//set up the tunables if provided in range
setChaosTunables(experimentsDetails)
// PrepareNodeIOStress contains prepration steps before chaos injection
func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
log.InfoWithValues("[Info]: The details of chaos tunables are:", logrus.Fields{
"FilesystemUtilizationBytes": experimentsDetails.FilesystemUtilizationBytes,
var err error
if experimentsDetails.AppNode == "" {
//Select node for node-io-stress
appNodeName, err := common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, clients)
if err != nil {
return errors.Errorf("Unable to get the application nodename, err: %v", err)
}
experimentsDetails.AppNode = appNodeName
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"NodeName": experimentsDetails.AppNode,
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"CPU Core": experimentsDetails.CPU,
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
"Node Affected Percentage": experimentsDetails.NodesAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
experimentsDetails.RunID = common.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Select node for node-io-stress
nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc)
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get node list")
}
log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{
"No. Of Nodes": len(targetNodeList),
"Node Names": targetNodeList,
})
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + experimentsDetails.AppNode + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// Creating the helper pod to perform node io stress
err = CreateHelperPod(experimentsDetails, clients)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "name="+experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", experimentsDetails.ChaosDuration+30)
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, "name="+experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, clients, experimentsDetails.ChaosDuration+30, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
return errors.Errorf("helper pod failed due to, err: %v", err)
}
// Checking the status of application node
log.Info("[Status]: Getting the status of application node")
err = status.CheckNodeStatus(experimentsDetails.AppNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
log.Warn("Application node is not in the ready state, you may need to manually recover the node")
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, "name="+experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
//Waiting for the ramp time after chaos injection
@ -85,196 +99,70 @@ func PrepareNodeIOStress(ctx context.Context, experimentsDetails *experimentType
return nil
}
// injectChaosInSerialMode stress the io of all the target nodes serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeIOStressFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + appNode + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Info]: Details of Node under chaos injection", logrus.Fields{
"NodeName": appNode,
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
})
experimentsDetails.RunID = stringutils.GetRunID()
// Creating the helper pod to perform node io stress
if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
common.SetTargets(appNode, "targeted", "node", chaosDetails)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode stress the io of all the target nodes in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeIOStressFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
experimentsDetails.RunID = stringutils.GetRunID()
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + appNode + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Info]: Details of Node under chaos injection", logrus.Fields{
"NodeName": appNode,
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
})
// Creating the helper pod to perform node io stress
if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
for _, appNode := range targetNodeList {
common.SetTargets(appNode, "targeted", "node", chaosDetails)
}
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeIOStressFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
Name: experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName,
"name": experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNode,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: experimentsDetails.AppNode,
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
ImagePullPolicy: apiv1.PullAlways,
Command: []string{
"stress-ng",
"/stress-ng",
},
Args: getContainerArguments(experimentsDetails),
Resources: chaosDetails.Resources,
Args: GetContainerArguments(experimentsDetails),
},
},
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}
// getContainerArguments derives the args for the pumba stress helper pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) []string {
// GetContainerArguments derives the args for the pumba stress helper pod
func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) []string {
var hddbytes string
if experimentsDetails.FilesystemUtilizationBytes == "0" {
if experimentsDetails.FilesystemUtilizationPercentage == "0" {
if experimentsDetails.FilesystemUtilizationBytes == 0 {
if experimentsDetails.FilesystemUtilizationPercentage == 0 {
hddbytes = "10%"
log.Info("Neither of FilesystemUtilizationPercentage or FilesystemUtilizationBytes provided, proceeding with a default FilesystemUtilizationPercentage value of 10%")
} else {
hddbytes = experimentsDetails.FilesystemUtilizationPercentage + "%"
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationPercentage) + "%"
}
} else {
if experimentsDetails.FilesystemUtilizationPercentage == "0" {
hddbytes = experimentsDetails.FilesystemUtilizationBytes + "G"
if experimentsDetails.FilesystemUtilizationPercentage == 0 {
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationBytes) + "G"
} else {
hddbytes = experimentsDetails.FilesystemUtilizationPercentage + "%"
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationPercentage) + "%"
log.Warn("Both FsUtilPercentage & FsUtilBytes provided as inputs, using the FsUtilPercentage value to proceed with stress exp")
}
}
stressArgs := []string{
"--cpu",
experimentsDetails.CPU,
"--vm",
experimentsDetails.VMWorkers,
"--io",
experimentsDetails.NumberOfWorkers,
strconv.Itoa(experimentsDetails.NumberOfWorkers),
"--hdd",
experimentsDetails.NumberOfWorkers,
strconv.Itoa(experimentsDetails.NumberOfWorkers),
"--hdd-bytes",
hddbytes,
"--timeout",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--temp-path",
"/tmp",
}
return stressArgs
}
// setChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.FilesystemUtilizationBytes = common.ValidateRange(experimentsDetails.FilesystemUtilizationBytes)
experimentsDetails.FilesystemUtilizationPercentage = common.ValidateRange(experimentsDetails.FilesystemUtilizationPercentage)
experimentsDetails.CPU = common.ValidateRange(experimentsDetails.CPU)
experimentsDetails.VMWorkers = common.ValidateRange(experimentsDetails.VMWorkers)
experimentsDetails.NumberOfWorkers = common.ValidateRange(experimentsDetails.NumberOfWorkers)
experimentsDetails.NodesAffectedPerc = common.ValidateRange(experimentsDetails.NodesAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -1,80 +1,101 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-memory-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareNodeMemoryHog contains preparation steps before chaos injection
func PrepareNodeMemoryHog(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeMemoryHogFault")
defer span.End()
// PrepareNodeMemoryHog contains prepration steps before chaos injection
func PrepareNodeMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//set up the tunables if provided in range
setChaosTunables(experimentsDetails)
if experimentsDetails.AppNode == "" {
//Select node for kubelet-service-kill
appNodeName, err := common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, clients)
if err != nil {
return errors.Errorf("Unable to get the application nodename, err: %v", err)
}
log.InfoWithValues("[Info]: The details of chaos tunables are:", logrus.Fields{
"MemoryConsumptionMebibytes": experimentsDetails.MemoryConsumptionMebibytes,
"MemoryConsumptionPercentage": experimentsDetails.MemoryConsumptionPercentage,
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
"Node Affected Percentage": experimentsDetails.NodesAffectedPerc,
"Sequence": experimentsDetails.Sequence,
experimentsDetails.AppNode = appNodeName
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"NodeName": experimentsDetails.AppNode,
"MemoryHog Percentage": experimentsDetails.MemoryPercentage,
})
experimentsDetails.RunID = common.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Select node for node-memory-hog
nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc)
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get node list")
}
log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{
"No. Of Nodes": len(targetNodeList),
"Node Names": targetNodeList,
})
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + experimentsDetails.AppNode + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
//Getting node memory details
memoryCapacity, memoryAllocatable, err := GetNodeMemoryDetails(experimentsDetails.AppNode, clients)
if err != nil {
return errors.Errorf("Unable to get the node memory details, err: %v", err)
}
// Get the total memory percentage wrt allocatable memory
experimentsDetails.MemoryPercentage = CalculateMemoryPercentage(experimentsDetails, clients, memoryCapacity, memoryAllocatable)
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// Creating the helper pod to perform node memory hog
err = CreateHelperPod(experimentsDetails, clients)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "name="+experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", experimentsDetails.ChaosDuration+30)
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, "name="+experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, clients, experimentsDetails.ChaosDuration+30, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
return errors.Errorf("helper pod failed due to, err: %v", err)
}
// Checking the status of application node
log.Info("[Status]: Getting the status of application node")
err = status.CheckNodeStatus(experimentsDetails.AppNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
log.Warn("Application node is not in the ready state, you may need to manually recover the node")
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, "name="+experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
//Waiting for the ramp time after chaos injection
@ -85,259 +106,79 @@ func PrepareNodeMemoryHog(ctx context.Context, experimentsDetails *experimentTyp
return nil
}
// injectChaosInSerialMode stress the memory of all the target nodes serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeMemoryHogFaultInSerialMode")
defer span.End()
// GetNodeMemoryDetails will return the total memory capacity and memory allocatable of an application node
func GetNodeMemoryDetails(appNodeName string, clients clients.ClientSets) (int, int, error) {
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + appNode + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Info]: Details of Node under chaos injection", logrus.Fields{
"NodeName": appNode,
"Memory Consumption Percentage": experimentsDetails.MemoryConsumptionPercentage,
"Memory Consumption Mebibytes": experimentsDetails.MemoryConsumptionMebibytes,
})
experimentsDetails.RunID = stringutils.GetRunID()
//Getting node memory details
memoryCapacity, memoryAllocatable, err := getNodeMemoryDetails(appNode, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get node memory details")
}
//Getting the exact memory value to exhaust
MemoryConsumption, err := calculateMemoryConsumption(experimentsDetails, memoryCapacity, memoryAllocatable)
if err != nil {
return stacktrace.Propagate(err, "could not calculate memory consumption value")
}
// Creating the helper pod to perform node memory hog
if err = createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients, MemoryConsumption); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
common.SetTargets(appNode, "targeted", "node", chaosDetails)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode stress the memory all the target nodes in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeMemoryHogFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
experimentsDetails.RunID = stringutils.GetRunID()
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + appNode + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Info]: Details of Node under chaos injection", logrus.Fields{
"NodeName": appNode,
"Memory Consumption Percentage": experimentsDetails.MemoryConsumptionPercentage,
"Memory Consumption Mebibytes": experimentsDetails.MemoryConsumptionMebibytes,
})
//Getting node memory details
memoryCapacity, memoryAllocatable, err := getNodeMemoryDetails(appNode, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get node memory details")
}
//Getting the exact memory value to exhaust
MemoryConsumption, err := calculateMemoryConsumption(experimentsDetails, memoryCapacity, memoryAllocatable)
if err != nil {
return stacktrace.Propagate(err, "could not calculate memory consumption value")
}
// Creating the helper pod to perform node memory hog
if err = createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients, MemoryConsumption); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
for _, appNode := range targetNodeList {
common.SetTargets(appNode, "targeted", "node", chaosDetails)
}
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
return nil
}
// getNodeMemoryDetails will return the total memory capacity and memory allocatable of an application node
func getNodeMemoryDetails(appNodeName string, clients clients.ClientSets) (int, int, error) {
nodeDetails, err := clients.GetNode(appNodeName, 180, 2)
nodeDetails, err := clients.KubeClient.CoreV1().Nodes().Get(appNodeName, v1.GetOptions{})
if err != nil {
return 0, 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNodeName), Reason: err.Error()}
return 0, 0, err
}
memoryCapacity := int(nodeDetails.Status.Capacity.Memory().Value())
memoryAllocatable := int(nodeDetails.Status.Allocatable.Memory().Value())
if memoryCapacity == 0 || memoryAllocatable == 0 {
return memoryCapacity, memoryAllocatable, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNodeName), Reason: "failed to get memory details of the target node"}
return memoryCapacity, memoryAllocatable, errors.Errorf("Failed to get memory details of the application node")
}
return memoryCapacity, memoryAllocatable, nil
}
// calculateMemoryConsumption will calculate the amount of memory to be consumed for a given unit.
func calculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDetails, memoryCapacity, memoryAllocatable int) (string, error) {
// CalculateMemoryPercentage will calculate the memory percentage under chaos wrt allocatable memory
func CalculateMemoryPercentage(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, memoryCapacity, memoryAllocatable int) int {
var totalMemoryConsumption int
var MemoryConsumption string
var selector string
var totalMemoryPercentage int
if experimentsDetails.MemoryConsumptionMebibytes == "0" {
if experimentsDetails.MemoryConsumptionPercentage == "0" {
log.Info("Neither of MemoryConsumptionPercentage or MemoryConsumptionMebibytes provided, proceeding with a default MemoryConsumptionPercentage value of 30%%")
return "30%", nil
}
selector = "percentage"
} else {
if experimentsDetails.MemoryConsumptionPercentage == "0" {
selector = "mebibytes"
} else {
log.Warn("Both MemoryConsumptionPercentage & MemoryConsumptionMebibytes provided as inputs, using the MemoryConsumptionPercentage value to proceed with the experiment")
selector = "percentage"
}
}
//Getting the total memory under chaos
memoryForChaos := ((float64(experimentsDetails.MemoryPercentage) / 100) * float64(memoryCapacity))
switch selector {
//Get the percentage of memory under chaos wrt allocatable memory
totalMemoryPercentage = int((float64(memoryForChaos) / float64(memoryAllocatable)) * 100)
case "percentage":
log.Infof("[Info]: PercentageOfMemoryCapacityToBeUsed: %d, which is %d percent of Allocatable Memory", experimentsDetails.MemoryPercentage, totalMemoryPercentage)
//Getting the total memory under chaos
memoryConsumptionPercentage, _ := strconv.ParseFloat(experimentsDetails.MemoryConsumptionPercentage, 64)
memoryForChaos := (memoryConsumptionPercentage / 100) * float64(memoryCapacity)
//Get the percentage of memory under chaos wrt allocatable memory
totalMemoryConsumption = int((memoryForChaos / float64(memoryAllocatable)) * 100)
if totalMemoryConsumption > 100 {
log.Infof("[Info]: PercentageOfMemoryCapacity To Be Used: %v percent, which is more than 100 percent (%d percent) of Allocatable Memory, so the experiment will only consume upto 100 percent of Allocatable Memory", experimentsDetails.MemoryConsumptionPercentage, totalMemoryConsumption)
MemoryConsumption = "100%"
} else {
log.Infof("[Info]: PercentageOfMemoryCapacity To Be Used: %v percent, which is %d percent of Allocatable Memory", experimentsDetails.MemoryConsumptionPercentage, totalMemoryConsumption)
MemoryConsumption = strconv.Itoa(totalMemoryConsumption) + "%"
}
return MemoryConsumption, nil
case "mebibytes":
// Bringing all the values in Ki unit to compare
// since 1Mi = 1025.390625Ki
memoryConsumptionMebibytes, _ := strconv.ParseFloat(experimentsDetails.MemoryConsumptionMebibytes, 64)
TotalMemoryConsumption := memoryConsumptionMebibytes * 1025.390625
// since 1Ki = 1024 bytes
memoryAllocatable := memoryAllocatable / 1024
if memoryAllocatable < int(TotalMemoryConsumption) {
MemoryConsumption = strconv.Itoa(memoryAllocatable) + "k"
log.Infof("[Info]: The memory for consumption %vKi is more than the available memory %vKi, so the experiment will hog the memory upto %vKi", int(TotalMemoryConsumption), memoryAllocatable, memoryAllocatable)
} else {
MemoryConsumption = experimentsDetails.MemoryConsumptionMebibytes + "m"
}
return MemoryConsumption, nil
}
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: "specify the memory consumption value either in percentage or mebibytes in a non-decimal format using respective envs"}
return totalMemoryPercentage
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets, MemoryConsumption string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeMemoryHogFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
Name: experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName,
"name": experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNode,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: experimentsDetails.AppNode,
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
ImagePullPolicy: apiv1.PullAlways,
Command: []string{
"stress-ng",
},
Args: []string{
"--vm",
experimentsDetails.NumberOfWorkers,
"1",
"--vm-bytes",
MemoryConsumption,
strconv.Itoa(experimentsDetails.MemoryPercentage) + "%",
"--timeout",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
},
Resources: chaosDetails.Resources,
},
},
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// setChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.MemoryConsumptionMebibytes = common.ValidateRange(experimentsDetails.MemoryConsumptionMebibytes)
experimentsDetails.MemoryConsumptionPercentage = common.ValidateRange(experimentsDetails.MemoryConsumptionPercentage)
experimentsDetails.NumberOfWorkers = common.ValidateRange(experimentsDetails.NumberOfWorkers)
experimentsDetails.NodesAffectedPerc = common.ValidateRange(experimentsDetails.NodesAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}

View File

@ -1,213 +0,0 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-restart/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var err error
const (
secretName string = "id-rsa"
privateKeyMount string = "/mnt"
privateKeyPath string = "/mnt/ssh-privatekey"
emptyDirMount string = "/data"
emptyDirPath string = "/data/ssh-privatekey"
privateKeySecret string = "private-key-cm-"
emptyDirVolume string = "empty-dir-"
ObjectNameField = "metadata.name"
)
// PrepareNodeRestart contains preparation steps before chaos injection
func PrepareNodeRestart(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeRestartFault")
defer span.End()
//Select the node
if experimentsDetails.TargetNode == "" {
//Select node for node-restart
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get node name")
}
}
// get the node ip
if experimentsDetails.TargetNodeIP == "" {
experimentsDetails.TargetNodeIP, err = getInternalIP(experimentsDetails.TargetNode, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get internal ip")
}
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Node": experimentsDetails.TargetNode,
"Target Node IP": experimentsDetails.TargetNodeIP,
})
experimentsDetails.RunID = stringutils.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", strconv.Itoa(experimentsDetails.RampTime))
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + experimentsDetails.TargetNode + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
}
}
// Creating the helper pod to perform node restart
if err = createHelperPod(ctx, experimentsDetails, chaosDetails, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
if err := common.CheckHelperStatusAndRunProbes(ctx, appLabel, experimentsDetails.TargetNode, chaosDetails, clients, resultDetails, eventsDetails); err != nil {
return err
}
if err := common.WaitForCompletionAndDeleteHelperPods(appLabel, chaosDetails, clients, false); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", strconv.Itoa(experimentsDetails.RampTime))
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, clients clients.ClientSets) error {
// This method is attaching emptyDir along with secret volume, and copy data from secret
// to the emptyDir, because secret is mounted as readonly and with 777 perms and it can't be changed
// because of: https://github.com/kubernetes/kubernetes/issues/57923
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeRestartFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
Affinity: &apiv1.Affinity{
NodeAffinity: &apiv1.NodeAffinity{
RequiredDuringSchedulingIgnoredDuringExecution: &apiv1.NodeSelector{
NodeSelectorTerms: []apiv1.NodeSelectorTerm{
{
MatchFields: []apiv1.NodeSelectorRequirement{
{
Key: ObjectNameField,
Operator: apiv1.NodeSelectorOpNotIn,
Values: []string{experimentsDetails.TargetNode},
},
},
},
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/sh",
},
Args: []string{"-c", fmt.Sprintf("cp %[1]s %[2]s && chmod 400 %[2]s && ssh -o \"StrictHostKeyChecking=no\" -o \"UserKnownHostsFile=/dev/null\" -i %[2]s %[3]s@%[4]s %[5]s", privateKeyPath, emptyDirPath, experimentsDetails.SSHUser, experimentsDetails.TargetNodeIP, experimentsDetails.RebootCommand)},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: privateKeySecret + experimentsDetails.RunID,
MountPath: privateKeyMount,
},
{
Name: emptyDirVolume + experimentsDetails.RunID,
MountPath: emptyDirMount,
},
},
},
},
Volumes: []apiv1.Volume{
{
Name: privateKeySecret + experimentsDetails.RunID,
VolumeSource: apiv1.VolumeSource{
Secret: &apiv1.SecretVolumeSource{
SecretName: secretName,
},
},
},
{
Name: emptyDirVolume + experimentsDetails.RunID,
VolumeSource: apiv1.VolumeSource{
EmptyDir: &apiv1.EmptyDirVolumeSource{},
},
},
},
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getInternalIP gets the internal ip of the given node
func getInternalIP(nodeName string, clients clients.ClientSets) (string, error) {
node, err := clients.GetNode(nodeName, 180, 2)
if err != nil {
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", nodeName), Reason: err.Error()}
}
for _, addr := range node.Status.Addresses {
if strings.ToLower(string(addr.Type)) == "internalip" {
return addr.Address, nil
}
}
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", nodeName), Reason: "failed to get the internal ip of the target node"}
}

View File

@ -1,7 +1,6 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
@ -9,41 +8,23 @@ import (
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-taint/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var (
err error
inject, abort chan os.Signal
)
var err error
// PrepareNodeTaint contains the preparation steps before chaos injection
func PrepareNodeTaint(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeTaintFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//PrepareNodeTaint contains the prepration steps before chaos injection
func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -51,65 +32,90 @@ func PrepareNodeTaint(ctx context.Context, experimentsDetails *experimentTypes.E
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.TargetNode == "" {
if experimentsDetails.AppNode == "" {
//Select node for kubelet-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
appNodeName, err := common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get node name")
return errors.Errorf("Unable to get the application nodename, err: %v", err)
}
experimentsDetails.AppNode = appNodeName
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + experimentsDetails.TargetNode + " node"
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + experimentsDetails.AppNode + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, clients, resultDetails, chaosDetails, eventsDetails)
// taint the application node
if err := taintNode(ctx, experimentsDetails, clients, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not taint node")
err := TaintNode(experimentsDetails, clients)
if err != nil {
return err
}
// Verify the status of AUT after reschedule
log.Info("[Status]: Verify the status of AUT after reschedule")
if err = status.AUTStatusCheck(clients, chaosDetails); err != nil {
log.Info("[Revert]: Reverting chaos because application status check failed")
if taintErr := removeTaintFromNode(experimentsDetails, clients, chaosDetails); taintErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(taintErr).Error())}
}
return err
err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("Application status check failed due to, err: %v", err)
}
// Verify the status of Auxiliary Applications after reschedule
if experimentsDetails.AuxiliaryAppInfo != "" {
log.Info("[Status]: Verify that the Auxiliary Applications are running")
if err = status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
log.Info("[Revert]: Reverting chaos because auxiliary application status check failed")
if taintErr := removeTaintFromNode(experimentsDetails, clients, chaosDetails); taintErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(taintErr).Error())}
}
return err
err = status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("Auxiliary Applications status check failed, err: %v", err)
}
}
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
log.Info("[Chaos]: Stopping the experiment")
loop:
for {
endTime = time.After(timeDelay)
select {
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
// updating the chaosresult after stopped
failStep := "Node Taint injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
// generating summary event in chaosengine
msg := experimentsDetails.ExperimentName + " experiment has been aborted"
types.SetEngineEventAttributes(eventsDetails, types.Summary, msg, "Warning", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
// generating summary event in chaosresult
types.SetResultEventAttributes(eventsDetails, types.StoppedVerdict, msg, "Warning", resultDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosResult")
if err = RemoveTaintFromNode(experimentsDetails, clients); err != nil {
log.Errorf("unable to remove taint from the node, err :%v", err)
}
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
// remove taint from the application node
if err := removeTaintFromNode(experimentsDetails, clients, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not remove taint from node")
err = RemoveTaintFromNode(experimentsDetails, clients)
if err != nil {
return err
}
//Waiting for the ramp time after chaos injection
@ -120,136 +126,105 @@ func PrepareNodeTaint(ctx context.Context, experimentsDetails *experimentTypes.E
return nil
}
// taintNode taint the application node
func taintNode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeTaintFault")
defer span.End()
// TaintNode taint the application node
func TaintNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
// get the taint labels & effect
taintKey, taintValue, taintEffect := getTaintDetails(experimentsDetails)
log.Infof("Add %v taints to the %v node", taintKey+"="+taintValue+":"+taintEffect, experimentsDetails.TargetNode)
TaintKey, TaintValue, TaintEffect := GetTaintDetails(experimentsDetails)
// get the node details
node, err := clients.GetNode(experimentsDetails.TargetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{nodeName: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
node, err := clients.KubeClient.CoreV1().Nodes().Get(experimentsDetails.AppNode, v1.GetOptions{})
if err != nil || node == nil {
return errors.Errorf("failed to get %v node, err: %v", experimentsDetails.AppNode, err)
}
// check if the taint already exists
tainted := false
for _, taint := range node.Spec.Taints {
if taint.Key == taintKey {
if taint.Key == TaintKey {
tainted = true
break
}
}
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
if !tainted {
node.Spec.Taints = append(node.Spec.Taints, apiv1.Taint{
Key: taintKey,
Value: taintValue,
Effect: apiv1.TaintEffect(taintEffect),
})
if !tainted {
node.Spec.Taints = append(node.Spec.Taints, apiv1.Taint{
Key: TaintKey,
Value: TaintValue,
Effect: apiv1.TaintEffect(TaintEffect),
})
if err := clients.UpdateNode(chaosDetails, node); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{nodeName: %s}", node.Name), Reason: fmt.Sprintf("failed to add taints: %s", err.Error())}
}
updatedNodeWithTaint, err := clients.KubeClient.CoreV1().Nodes().Update(node)
if err != nil || updatedNodeWithTaint == nil {
return fmt.Errorf("failed to update %v node after adding taints, err: %v", experimentsDetails.AppNode, err)
}
common.SetTargets(node.Name, "injected", "node", chaosDetails)
log.Infof("Successfully added taint in %v node", experimentsDetails.TargetNode)
}
log.Infof("Successfully added taint in %v node", experimentsDetails.AppNode)
return nil
}
// removeTaintFromNode remove the taint from the application node
func removeTaintFromNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// RemoveTaintFromNode remove the taint from the application node
func RemoveTaintFromNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
// Get the taint key
taintLabel := strings.Split(experimentsDetails.Taints, ":")
taintKey := strings.Split(taintLabel[0], "=")[0]
TaintLabel := strings.Split(experimentsDetails.Taints, ":")
TaintKey := strings.Split(TaintLabel[0], "=")[0]
// get the node details
node, err := clients.GetNode(experimentsDetails.TargetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{nodeName: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
node, err := clients.KubeClient.CoreV1().Nodes().Get(experimentsDetails.AppNode, v1.GetOptions{})
if err != nil || node == nil {
return errors.Errorf("failed to get %v node, err: %v", experimentsDetails.AppNode, err)
}
// check if the taint already exists
tainted := false
for _, taint := range node.Spec.Taints {
if taint.Key == taintKey {
if taint.Key == TaintKey {
tainted = true
break
}
}
if tainted {
var newTaints []apiv1.Taint
var Newtaints []apiv1.Taint
// remove all the taints with matching key
for _, taint := range node.Spec.Taints {
if taint.Key != taintKey {
newTaints = append(newTaints, taint)
if taint.Key != TaintKey {
Newtaints = append(Newtaints, taint)
}
}
node.Spec.Taints = newTaints
if err := clients.UpdateNode(chaosDetails, node); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{nodeName: %s}", node.Name), Reason: fmt.Sprintf("failed to remove taints: %s", err.Error())}
node.Spec.Taints = Newtaints
updatedNodeWithTaint, err := clients.KubeClient.CoreV1().Nodes().Update(node)
if err != nil || updatedNodeWithTaint == nil {
return fmt.Errorf("failed to update %v node after removing taints, err: %v", experimentsDetails.AppNode, err)
}
}
common.SetTargets(node.Name, "reverted", "node", chaosDetails)
log.Infof("Successfully removed taint from the %v node", node.Name)
return nil
}
// GetTaintDetails return the key, value and effect for the taint
func getTaintDetails(experimentsDetails *experimentTypes.ExperimentDetails) (string, string, string) {
taintValue := "node-taint"
taintEffect := string(apiv1.TaintEffectNoExecute)
func GetTaintDetails(experimentsDetails *experimentTypes.ExperimentDetails) (string, string, string) {
TaintValue := "node-taint"
TaintEffect := string(apiv1.TaintEffectNoExecute)
taints := strings.Split(experimentsDetails.Taints, ":")
taintLabel := strings.Split(taints[0], "=")
taintKey := taintLabel[0]
Taints := strings.Split(experimentsDetails.Taints, ":")
TaintLabel := strings.Split(Taints[0], "=")
TaintKey := TaintLabel[0]
// It will set the value for taint label from `TAINT` env, if provided
// otherwise it will use the `node-taint` value as default value.
if len(taintLabel) >= 2 {
taintValue = taintLabel[1]
if len(TaintLabel) >= 2 {
TaintValue = TaintLabel[1]
}
// It will set the value for taint effect from `TAINT` env, if provided
// otherwise it will use `NoExecute` value as default value.
if len(taints) >= 2 {
taintEffect = taints[1]
if len(Taints) >= 2 {
TaintEffect = Taints[1]
}
return taintKey, taintValue, taintEffect
}
return TaintKey, TaintValue, TaintEffect
// abortWatcher continuously watch for the abort signals
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails) {
// waiting till the abort signal received
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
if err := removeTaintFromNode(experimentsDetails, clients, chaosDetails); err != nil {
log.Errorf("Unable to untaint node, err: %v", err)
}
retry--
time.Sleep(1 * time.Second)
}
log.Info("Chaos Revert Completed")
os.Exit(0)
}

View File

@ -1,43 +1,29 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-autoscaler/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/math"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/sirupsen/logrus"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
appsv1 "k8s.io/client-go/kubernetes/typed/apps/v1"
retries "k8s.io/client-go/util/retry"
"github.com/pkg/errors"
)
var (
err error
appsv1DeploymentClient appsv1.DeploymentInterface
appsv1StatefulsetClient appsv1.StatefulSetInterface
)
var err error
// PreparePodAutoscaler contains the preparation steps and chaos injection steps
func PreparePodAutoscaler(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodAutoscalerFault")
defer span.End()
//PreparePodAutoscaler contains the prepration steps before chaos injection
func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
appName, replicaCount, err := GetApplicationDetails(experimentsDetails, clients)
if err != nil {
return errors.Errorf("Unable to get the relicaCount of the application, err: %v", err)
}
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -45,67 +31,14 @@ func PreparePodAutoscaler(ctx context.Context, experimentsDetails *experimentTyp
common.WaitForDuration(experimentsDetails.RampTime)
}
// initialise the resource clients
appsv1DeploymentClient = clients.KubeClient.AppsV1().Deployments(experimentsDetails.AppNS)
appsv1StatefulsetClient = clients.KubeClient.AppsV1().StatefulSets(experimentsDetails.AppNS)
err = PodAutoscalerChaos(experimentsDetails, clients, replicaCount, appName, resultDetails, eventsDetails, chaosDetails)
if err != nil {
return errors.Errorf("Unable to perform autoscaling, err: %v", err)
}
switch strings.ToLower(experimentsDetails.AppKind) {
case "deployment", "deployments":
appsUnderTest, err := getDeploymentDetails(experimentsDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get deployment details")
}
deploymentList := []string{}
for _, deployment := range appsUnderTest {
deploymentList = append(deploymentList, deployment.AppName)
}
log.InfoWithValues("[Info]: Details of Deployments under chaos injection", logrus.Fields{
"Number Of Deployment": len(deploymentList),
"Target Deployments": deploymentList,
})
//calling go routine which will continuously watch for the abort signal
go abortPodAutoScalerChaos(appsUnderTest, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails)
if err = podAutoscalerChaosInDeployment(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not scale deployment")
}
if err = autoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not revert scaling in deployment")
}
case "statefulset", "statefulsets":
appsUnderTest, err := getStatefulsetDetails(experimentsDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get statefulset details")
}
var stsList []string
for _, sts := range appsUnderTest {
stsList = append(stsList, sts.AppName)
}
log.InfoWithValues("[Info]: Details of Statefulsets under chaos injection", logrus.Fields{
"Number Of Statefulsets": len(stsList),
"Target Statefulsets": stsList,
})
//calling go routine which will continuously watch for the abort signal
go abortPodAutoScalerChaos(appsUnderTest, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails)
if err = podAutoscalerChaosInStatefulset(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not scale statefulset")
}
if err = autoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not revert scaling in statefulset")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{kind: %s}", experimentsDetails.AppKind), Reason: "application type is not supported"}
err = AutoscalerRecovery(experimentsDetails, clients, replicaCount, appName)
if err != nil {
return errors.Errorf("Unable to recover the auto scaling, err: %v", err)
}
//Waiting for the ramp time after chaos injection
@ -116,329 +49,139 @@ func PreparePodAutoscaler(ctx context.Context, experimentsDetails *experimentTyp
return nil
}
func getSliceOfTotalApplicationsTargeted(appList []experimentTypes.ApplicationUnderTest, experimentsDetails *experimentTypes.ExperimentDetails) []experimentTypes.ApplicationUnderTest {
//GetApplicationDetails is used to get the name and total number of replicas of the application
func GetApplicationDetails(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (string, int, error) {
var appReplica int
var appName string
// Get Deployment replica count
applicationList, err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.AppNS).List(metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
if err != nil || len(applicationList.Items) == 0 {
return "", 0, errors.Errorf("Unable to list the application, err: %v", err)
}
for _, app := range applicationList.Items {
appReplica = int(*app.Spec.Replicas)
appName = app.Name
}
return appName, appReplica, nil
newAppListLength := math.Maximum(1, math.Adjustment(math.Minimum(experimentsDetails.AppAffectPercentage, 100), len(appList)))
return appList[:newAppListLength]
}
// getDeploymentDetails is used to get the name and total number of replicas of the deployment
func getDeploymentDetails(experimentsDetails *experimentTypes.ExperimentDetails) ([]experimentTypes.ApplicationUnderTest, error) {
//PodAutoscalerChaos scales up the application pod replicas
func PodAutoscalerChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, replicaCount int, appName string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
deploymentList, err := appsv1DeploymentClient.List(context.Background(), metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: deployment, labels: %s}", experimentsDetails.AppLabel), Reason: err.Error()}
} else if len(deploymentList.Items) == 0 {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: deployment, labels: %s}", experimentsDetails.AppLabel), Reason: "no deployment found with matching labels"}
}
var appsUnderTest []experimentTypes.ApplicationUnderTest
for _, app := range deploymentList.Items {
log.Infof("[Info]: Found deployment name '%s' with replica count '%d'", app.Name, int(*app.Spec.Replicas))
appsUnderTest = append(appsUnderTest, experimentTypes.ApplicationUnderTest{AppName: app.Name, ReplicaCount: int(*app.Spec.Replicas)})
}
// Applying the APP_AFFECTED_PERC variable to determine the total target deployments to scale
return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails), nil
}
// getStatefulsetDetails is used to get the name and total number of replicas of the statefulsets
func getStatefulsetDetails(experimentsDetails *experimentTypes.ExperimentDetails) ([]experimentTypes.ApplicationUnderTest, error) {
statefulsetList, err := appsv1StatefulsetClient.List(context.Background(), metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: statefulset, labels: %s}", experimentsDetails.AppLabel), Reason: err.Error()}
} else if len(statefulsetList.Items) == 0 {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: statefulset, labels: %s}", experimentsDetails.AppLabel), Reason: "no statefulset found with matching labels"}
}
appsUnderTest := []experimentTypes.ApplicationUnderTest{}
for _, app := range statefulsetList.Items {
log.Infof("[Info]: Found statefulset name '%s' with replica count '%d'", app.Name, int(*app.Spec.Replicas))
appsUnderTest = append(appsUnderTest, experimentTypes.ApplicationUnderTest{AppName: app.Name, ReplicaCount: int(*app.Spec.Replicas)})
}
// Applying the APP_AFFECT_PERC variable to determine the total target deployments to scale
return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails), nil
}
// podAutoscalerChaosInDeployment scales up the replicas of deployment and verify the status
func podAutoscalerChaosInDeployment(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
applicationClient := clients.KubeClient.AppsV1().Deployments(experimentsDetails.AppNS)
replicas := int32(experimentsDetails.Replicas)
// Scale Application
retryErr := retries.RetryOnConflict(retries.DefaultRetry, func() error {
for _, app := range appsUnderTest {
// Retrieve the latest version of Deployment before attempting update
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
appUnderTest, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: err.Error()}
}
// modifying the replica count
appUnderTest.Spec.Replicas = int32Ptr(int32(experimentsDetails.Replicas))
log.Infof("Updating deployment '%s' to number of replicas '%d'", appUnderTest.ObjectMeta.Name, experimentsDetails.Replicas)
_, err = appsv1DeploymentClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to scale deployment :%s", err.Error())}
}
common.SetTargets(app.AppName, "injected", "deployment", chaosDetails)
// Retrieve the latest version of Deployment before attempting update
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
appUnderTest, err := applicationClient.Get(appName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("Failed to get latest version of Application Deployment, err: %v", err)
}
return nil
// modifying the replica count
appUnderTest.Spec.Replicas = int32Ptr(replicas)
_, updateErr := applicationClient.Update(appUnderTest)
return updateErr
})
if retryErr != nil {
return retryErr
return errors.Errorf("Unable to scale the application, err: %v", retryErr)
}
log.Info("[Info]: The application started scaling")
return deploymentStatusCheck(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails)
}
// podAutoscalerChaosInStatefulset scales up the replicas of statefulset and verify the status
func podAutoscalerChaosInStatefulset(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Scale Application
retryErr := retries.RetryOnConflict(retries.DefaultRetry, func() error {
for _, app := range appsUnderTest {
// Retrieve the latest version of Statefulset before attempting update
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
appUnderTest, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: err.Error()}
}
// modifying the replica count
appUnderTest.Spec.Replicas = int32Ptr(int32(experimentsDetails.Replicas))
_, err = appsv1StatefulsetClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to scale statefulset :%s", err.Error())}
}
common.SetTargets(app.AppName, "injected", "statefulset", chaosDetails)
}
return nil
})
if retryErr != nil {
return retryErr
}
log.Info("[Info]: The application started scaling")
return statefulsetStatusCheck(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails)
}
// deploymentStatusCheck check the status of deployment and verify the available replicas
func deploymentStatusCheck(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
err = retry.
Times(uint(experimentsDetails.ChaosDuration / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
for _, app := range appsUnderTest {
deployment, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
if int(deployment.Status.ReadyReplicas) != experimentsDetails.Replicas {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to scale deployment, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, deployment.Status.ReadyReplicas)}
}
}
return nil
})
log.Info("Application Started Scaling")
err = ApplicationPodStatusCheck(experimentsDetails, appName, clients, replicaCount, resultDetails, eventsDetails, chaosDetails)
if err != nil {
if scaleErr := autoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest, chaosDetails); scaleErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(scaleErr).Error())}
}
return stacktrace.Propagate(err, "failed to scale replicas")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
if duration < experimentsDetails.ChaosDuration {
log.Info("[Wait]: Waiting for completion of chaos duration")
time.Sleep(time.Duration(experimentsDetails.ChaosDuration-duration) * time.Second)
return errors.Errorf("Status Check failed, err: %v", err)
}
return nil
}
// statefulsetStatusCheck check the status of statefulset and verify the available replicas
func statefulsetStatusCheck(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// ApplicationPodStatusCheck checks the status of the application pod
func ApplicationPodStatusCheck(experimentsDetails *experimentTypes.ExperimentDetails, appName string, clients clients.ClientSets, replicaCount int, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
applicationClient := clients.KubeClient.AppsV1().Deployments(experimentsDetails.AppNS)
isFailed := false
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
err = retry.
Times(uint(experimentsDetails.ChaosDuration / experimentsDetails.Delay)).
err := retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
for _, app := range appsUnderTest {
statefulset, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
if int(statefulset.Status.ReadyReplicas) != experimentsDetails.Replicas {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to scale statefulset, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, statefulset.Status.ReadyReplicas)}
}
applicationDeploy, err := applicationClient.Get(appName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("Unable to get the application, err: %v", err)
}
if int(applicationDeploy.Status.AvailableReplicas) != experimentsDetails.Replicas {
log.Infof("Application Pod Available Count is: %v", applicationDeploy.Status.AvailableReplicas)
isFailed = true
return errors.Errorf("Application is not scaled yet, err: %v", err)
}
isFailed = false
return nil
})
if err != nil {
if scaleErr := autoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest, chaosDetails); scaleErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(scaleErr).Error())}
if isFailed {
err = AutoscalerRecovery(experimentsDetails, clients, replicaCount, appName)
if err != nil {
return errors.Errorf("Unable to perform autoscaling, err: %v", err)
}
return stacktrace.Propagate(err, "failed to scale replicas")
return errors.Errorf("Failed to scale the appplication, err: %v", err)
} else if err != nil {
return err
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
if duration < experimentsDetails.ChaosDuration {
log.Info("[Wait]: Waiting for completion of chaos duration")
time.Sleep(time.Duration(experimentsDetails.ChaosDuration-duration) * time.Second)
}
// Keeping a wait time of 10s after all pod comes in running state
// This is optional and used just for viewing the pod status
time.Sleep(10 * time.Second)
return nil
}
// autoscalerRecoveryInDeployment rollback the replicas to initial values in deployment
func autoscalerRecoveryInDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, chaosDetails *types.ChaosDetails) error {
//AutoscalerRecovery scale back to initial number of replica
func AutoscalerRecovery(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, replicaCount int, appName string) error {
applicationClient := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaosNamespace)
// Scale back to initial number of replicas
retryErr := retries.RetryOnConflict(retries.DefaultRetry, func() error {
// Retrieve the latest version of Deployment before attempting update
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
for _, app := range appsUnderTest {
appUnderTest, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
appUnderTest.Spec.Replicas = int32Ptr(int32(app.ReplicaCount)) // modify replica count
_, err = appsv1DeploymentClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to revert scaling in deployment :%s", err.Error())}
}
common.SetTargets(app.AppName, "reverted", "deployment", chaosDetails)
appUnderTest, err := applicationClient.Get(appName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("Failed to get latest version of Application Deployment, err: %v", err)
}
return nil
})
if retryErr != nil {
return retryErr
}
log.Info("[Info]: Application started rolling back to original replica count")
return retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
for _, app := range appsUnderTest {
applicationDeploy, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
if int(applicationDeploy.Status.ReadyReplicas) != app.ReplicaCount {
log.Infof("[Info]: Application ready replica count is: %v", applicationDeploy.Status.ReadyReplicas)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to rollback deployment scaling, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, applicationDeploy.Status.ReadyReplicas)}
}
}
log.Info("[RollBack]: Application rollback to the initial number of replicas")
return nil
})
}
// autoscalerRecoveryInStatefulset rollback the replicas to initial values in deployment
func autoscalerRecoveryInStatefulset(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, chaosDetails *types.ChaosDetails) error {
// Scale back to initial number of replicas
retryErr := retries.RetryOnConflict(retries.DefaultRetry, func() error {
for _, app := range appsUnderTest {
// Retrieve the latest version of Statefulset before attempting update
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
appUnderTest, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
appUnderTest.Spec.Replicas = int32Ptr(int32(app.ReplicaCount)) // modify replica count
_, err = appsv1StatefulsetClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to revert scaling in statefulset :%s", err.Error())}
}
common.SetTargets(app.AppName, "reverted", "statefulset", chaosDetails)
}
return nil
appUnderTest.Spec.Replicas = int32Ptr(int32(replicaCount)) // modify replica count
_, updateErr := applicationClient.Update(appUnderTest)
return updateErr
})
if retryErr != nil {
return retryErr
return errors.Errorf("Unable to scale the, err: %v", retryErr)
}
log.Info("[Info]: Application pod started rolling back")
return retry.
err = retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
for _, app := range appsUnderTest {
applicationDeploy, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
if int(applicationDeploy.Status.ReadyReplicas) != app.ReplicaCount {
log.Infof("Application ready replica count is: %v", applicationDeploy.Status.ReadyReplicas)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to rollback statefulset scaling, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, applicationDeploy.Status.ReadyReplicas)}
}
applicationDeploy, err := applicationClient.Get(appName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("Unable to get the application, err: %v", err)
}
if int(applicationDeploy.Status.AvailableReplicas) != experimentsDetails.Replicas {
log.Infof("Application Pod Available Count is: %v", applicationDeploy.Status.AvailableReplicas)
return errors.Errorf("Unable to roll back to older replica count, err: %v", err)
}
log.Info("[RollBack]: Application roll back to initial number of replicas")
return nil
})
if err != nil {
return err
}
log.Info("[RollBack]: Application Pod roll back to initial number of replicas")
return nil
}
func int32Ptr(i int32) *int32 { return &i }
// abortPodAutoScalerChaos go routine will continuously watch for the abort signal for the entire chaos duration and generate the required events and result
func abortPodAutoScalerChaos(appsUnderTest []experimentTypes.ApplicationUnderTest, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) {
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
// waiting till the abort signal received
<-signChan
log.Info("[Chaos]: Revert Started")
// Note that we are attempting recovery (in this case scaling down to original replica count) after ..
// .. the tasks to patch results & generate events. This is so because the func AutoscalerRecovery..
// ..takes more time to complete - it involves a status check post the downscale. We have a period of ..
// .. few seconds before the pod deletion/removal occurs from the time the TERM is caught and thereby..
// ..run the risk of not updating the status of the objects/create events. With the current approach..
// ..tests indicate we succeed with the downscale/patch call, even if the status checks take longer
// As such, this is a workaround, and other solutions such as usage of pre-stop hooks etc., need to be explored
// Other experiments have simpler "recoveries" that are more or less guaranteed to work.
switch strings.ToLower(experimentsDetails.AppKind) {
case "deployment", "deployments":
if err := autoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
log.Errorf("the recovery after abortion failed err: %v", err)
}
case "statefulset", "statefulsets":
if err := autoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
log.Errorf("the recovery after abortion failed err: %v", err)
}
default:
log.Errorf("application type '%s' is not supported for the chaos", experimentsDetails.AppKind)
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
}

View File

@ -1,329 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-cpu-hog-exec/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
)
var inject chan os.Signal
// PrepareCPUExecStress contains the chaos preparation and injection steps
func PrepareCPUExecStress(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodCPUHogExecFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the CPU stress experiment
if err := experimentCPU(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not stress cpu")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// stressCPU Uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the CPU utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
func stressCPU(experimentsDetails *experimentTypes.ExperimentDetails, podName, ns string, clients clients.ClientSets, stressErr chan error) {
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", experimentsDetails.ChaosInjectCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, ns)
_, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err
}
// experimentCPU function orchestrates the experiment by calling the StressCPU function for every core, of every container, of every pod that is targeted
func experimentCPU(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode stressed the cpu of all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodCPUHogExecFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
select {
case <-inject:
// stopping the chaos execution, if abort signal received
time.Sleep(10 * time.Second)
os.Exit(0)
default:
for _, pod := range targetPodList.Items {
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"CPU CORE": experimentsDetails.CPUcores,
})
for i := 0; i < experimentsDetails.CPUcores; i++ {
go stressCPU(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
}
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
loop:
for {
endTime = time.After(timeDelay)
select {
case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if memory to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed")
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("podName: %s, namespace: %s, container: %s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: fmt.Sprintf("failed to stress cpu of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressCPUSerial(experimentsDetails, pod.Name, pod.Namespace, clients, chaosDetails); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
// updating the chaosresult after stopped
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
if err := killStressCPUSerial(experimentsDetails, pod.Name, pod.Namespace, clients, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not revert cpu stress")
}
}
}
return nil
}
// injectChaosInParallelMode stressed the cpu of all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodCPUHogExecFaultInParallelMode")
defer span.End()
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
select {
case <-inject:
// stopping the chaos execution, if abort signal received
time.Sleep(10 * time.Second)
os.Exit(0)
default:
for _, pod := range targetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"CPU CORE": experimentsDetails.CPUcores,
})
for i := 0; i < experimentsDetails.CPUcores; i++ {
go stressCPU(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
}
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
}
}
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
loop:
for {
endTime = time.After(timeDelay)
select {
case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if memory to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed")
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to stress cpu of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressCPUParallel(experimentsDetails, targetPodList, clients, chaosDetails); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
// updating the chaosresult after stopped
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
return killStressCPUParallel(experimentsDetails, targetPodList, clients, chaosDetails)
}
// killStressCPUSerial function to kill a stress process running inside target container
//
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressCPUSerial(experimentsDetails *experimentTypes.ExperimentDetails, podName, ns string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", experimentsDetails.ChaosKillCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, ns)
out, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, ns), Reason: fmt.Sprintf("failed to revert chaos: %s", out)}
}
common.SetTargets(podName, "reverted", "pod", chaosDetails)
return nil
}
// killStressCPUParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressCPUParallel(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
var errList []string
for _, pod := range targetPodList.Items {
if err := killStressCPUSerial(experimentsDetails, pod.Name, pod.Namespace, clients, chaosDetails); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}

View File

@ -0,0 +1,260 @@
package lib
import (
"os"
"os/signal"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-cpu-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/klog"
)
// StressCPU Uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the CPU utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
func StressCPU(containerName, podName, namespace, cpuHogCmd string, clients clients.ClientSets) error {
// It will contains all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", cpuHogCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return errors.Errorf("Unable to run stress command inside target container, err: %v", err)
}
return nil
}
//ExperimentCPU function orchestrates the experiment by calling the StressCPU function for every core, of every container, of every pod that is targeted
func ExperimentCPU(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.AppNS, experimentsDetails.TargetPod, experimentsDetails.AppLabel, experimentsDetails.PodsAffectedPerc, clients)
if err != nil {
return errors.Errorf("Unable to get the target pod list, err: %v", err)
}
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = GetTargetContainer(experimentsDetails, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("Unable to get the target container name, err: %v", err)
}
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
} else {
if err = InjectChaosInParallelMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
}
return nil
}
// InjectChaosInSerialMode stressed the cpu of all target application serially (one by one)
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
for _, pod := range targetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"container": experimentsDetails.TargetContainer,
"Pod": pod.Name,
"CPU CORE": experimentsDetails.CPUcores,
})
go StressCPU(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosInjectCmd, clients)
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
loop:
for {
endTime = time.After(timeDelay)
select {
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
err := KillStressCPUSerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients)
if err != nil {
klog.V(0).Infof("Error in Kill stress after abortion")
return err
}
// updating the chaosresult after stopped
failStep := "CPU hog Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
// generating summary event in chaosengine
msg := experimentsDetails.ExperimentName + " experiment has been aborted"
types.SetEngineEventAttributes(eventsDetails, types.Summary, msg, "Warning", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
// generating summary event in chaosresult
types.SetResultEventAttributes(eventsDetails, types.StoppedVerdict, msg, "Warning", resultDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosResult")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
if err := KillStressCPUSerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil {
return err
}
}
return nil
}
// InjectChaosInParallelMode stressed the cpu of all target application in parallel mode (all at once)
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
for _, pod := range targetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"container": experimentsDetails.TargetContainer,
"Pod": pod.Name,
"CPU CORE": experimentsDetails.CPUcores,
})
go StressCPU(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosInjectCmd, clients)
}
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
loop:
for {
endTime = time.After(timeDelay)
select {
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
err := KillStressCPUParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients)
if err != nil {
klog.V(0).Infof("Error in Kill stress after abortion")
return err
}
// updating the chaosresult after stopped
failStep := "CPU hog Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
// generating summary event in chaosengine
msg := experimentsDetails.ExperimentName + " experiment has been aborted"
types.SetEngineEventAttributes(eventsDetails, types.Summary, msg, "Warning", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
// generating summary event in chaosresult
types.SetResultEventAttributes(eventsDetails, types.StoppedVerdict, msg, "Warning", resultDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosResult")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
if err := KillStressCPUParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil {
return err
}
return nil
}
//PrepareCPUstress contains the steps for prepration before chaos
func PrepareCPUstress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the CPU stress experiment
err := ExperimentCPU(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails)
if err != nil {
return err
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
//GetTargetContainer will fetch the container name from application pod
// It will return the first container name from the application pod
func GetTargetContainer(experimentsDetails *experimentTypes.ExperimentDetails, appName string, clients clients.ClientSets) (string, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(appName, v1.GetOptions{})
if err != nil {
return "", err
}
return pod.Spec.Containers[0].Name, nil
}
// KillStressCPUSerial function to kill a stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func KillStressCPUSerial(containerName, podName, namespace, cpuFreeCmd string, clients clients.ClientSets) error {
// It will contains all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", cpuFreeCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return errors.Errorf("Unable to kill the stress process in %v pod, err: %v", podName, err)
}
return nil
}
// KillStressCPUParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func KillStressCPUParallel(containerName string, targetPodList corev1.PodList, namespace, cpuFreeCmd string, clients clients.ClientSets) error {
for _, pod := range targetPodList.Items {
if err := KillStressCPUSerial(containerName, pod.Name, namespace, cpuFreeCmd, clients); err != nil {
return err
}
}
return nil
}

View File

@ -1,33 +1,28 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-delete/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/math"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/workloads"
"github.com/palantir/stacktrace"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PreparePodDelete contains the preparation steps before chaos injection
func PreparePodDelete(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodDeleteFault")
defer span.End()
var err error
//PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Getting the iteration count for the pod deletion
GetIterations(experimentsDetails)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -35,25 +30,9 @@ func PreparePodDelete(ctx context.Context, experimentsDetails *experimentTypes.E
common.WaitForDuration(experimentsDetails.RampTime)
}
//set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
err = PodDeleteChaos(experimentsDetails, clients, eventsDetails, chaosDetails, resultDetails)
if err != nil {
return errors.Errorf("Unable to delete the application pods, err: %v", err)
}
//Waiting for the ramp time after chaos injection
@ -64,143 +43,32 @@ func PreparePodDelete(ctx context.Context, experimentsDetails *experimentTypes.E
return nil
}
// injectChaosInSerialMode delete the target application pods serial mode(one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDeleteFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
//GetIterations derive the iterations value from given parameters
func GetIterations(experimentsDetails *experimentTypes.ExperimentDetails) {
var Iterations int
if experimentsDetails.ChaosInterval != 0 {
Iterations = experimentsDetails.ChaosDuration / experimentsDetails.ChaosInterval
} else {
Iterations = 0
}
GracePeriod := int64(0)
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
// deriving the parent name of the target resources
for _, pod := range targetPodList.Items {
kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return stacktrace.Propagate(err, "could not get pod owner name and kind")
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Deleting the application pod
for _, pod := range targetPodList.Items {
log.InfoWithValues("[Info]: Killing the following pods", logrus.Fields{
"PodName": pod.Name})
if experimentsDetails.Force {
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
}
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
}
switch chaosDetails.Randomness {
case true:
if err := common.RandomInterval(experimentsDetails.ChaosInterval); err != nil {
return stacktrace.Propagate(err, "could not get random chaos interval")
}
default:
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaosInterval != "" {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaosInterval)
waitTime, _ := strconv.Atoi(experimentsDetails.ChaosInterval)
common.WaitForDuration(waitTime)
}
}
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
for _, parent := range chaosDetails.ParentsResources {
target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
log.Infof("[Completion]: %v chaos is done", experimentsDetails.ExperimentName)
return nil
experimentsDetails.Iterations = math.Maximum(Iterations, 1)
}
// injectChaosInParallelMode delete the target application pods in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDeleteFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
//PodDeleteChaos deletes the random single/multiple pods
func PodDeleteChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
GracePeriod := int64(0)
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
ChaosStartTimeStamp := time.Now().Unix()
for count := 0; count < experimentsDetails.Iterations; count++ {
for duration < experimentsDetails.ChaosDuration {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
targetPodList, err := common.GetPodList(experimentsDetails.AppNS, experimentsDetails.TargetPod, experimentsDetails.AppLabel, experimentsDetails.PodsAffectedPerc, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
// deriving the parent name of the target resources
for _, pod := range targetPodList.Items {
kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return stacktrace.Propagate(err, "could not get pod owner name and kind")
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
return errors.Errorf("Unable to get the target pod list, err: %v", err)
}
if experimentsDetails.EngineName != "" {
@ -215,53 +83,42 @@ func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experime
log.InfoWithValues("[Info]: Killing the following pods", logrus.Fields{
"PodName": pod.Name})
if experimentsDetails.Force {
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
if experimentsDetails.Force == true {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(pod.Name, &v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(pod.Name, &v1.DeleteOptions{})
}
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
return err
}
}
switch chaosDetails.Randomness {
case true:
if err := common.RandomInterval(experimentsDetails.ChaosInterval); err != nil {
return stacktrace.Propagate(err, "could not get random chaos interval")
}
default:
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaosInterval != "" {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaosInterval)
waitTime, _ := strconv.Atoi(experimentsDetails.ChaosInterval)
common.WaitForDuration(waitTime)
}
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaosInterval != 0 {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
}
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
for _, parent := range chaosDetails.ParentsResources {
target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return err
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
//ChaosCurrentTimeStamp contains the current timestamp
ChaosCurrentTimeStamp := time.Now().Unix()
//ChaosDiffTimeStamp contains the difference of current timestamp and start timestamp
//It will helpful to track the total chaos duration
chaosDiffTimeStamp := ChaosCurrentTimeStamp - ChaosStartTimeStamp
if int(chaosDiffTimeStamp) >= experimentsDetails.ChaosDuration {
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
break
}
}
log.Infof("[Completion]: %v chaos is done", experimentsDetails.ExperimentName)
return nil
}
// SetChaosTunables will setup a random value within a given range of values
// If the value is not provided in range it'll setup the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -1,298 +0,0 @@
package helper
import (
"bytes"
"context"
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"os"
"os/exec"
"os/signal"
"strconv"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-dns-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
clientTypes "k8s.io/apimachinery/pkg/types"
)
var (
abort, injectAbort chan os.Signal
err error
)
const (
// ProcessAlreadyKilled contains error code when process is already killed
ProcessAlreadyKilled = "no such process"
)
// Helper injects the dns chaos
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodDNSFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
resultDetails := types.ResultDetails{}
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// injectAbort channel is used to transmit signal notifications.
injectAbort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(injectAbort, os.Interrupt, syscall.SIGTERM)
//Fetching all the ENV passed for the helper pod
log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails)
// Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
// Initialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
if err := preparePodDNSChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
// preparePodDNSChaos contains the preparation steps before chaos injection
func preparePodDNSChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not parse targets")
}
var targets []targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
td.ContainerId, err = common.GetContainerID(td.Namespace, td.Name, td.TargetContainer, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the pid of the target container
td.Pid, err = common.GetPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
}
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos if an abort signal is received
go abortWatcher(targets, resultDetails.Name, chaosDetails.ChaosNamespace)
select {
case <-injectAbort:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
done := make(chan error, 1)
for index, t := range targets {
targets[index].Cmd, err = injectChaos(experimentsDetails, t)
if err != nil {
return stacktrace.Propagate(err, "could not inject chaos")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := terminateProcess(t); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.Info("[Wait]: Waiting for chaos completion")
// channel to check the completion of the stress process
go func() {
var errList []string
for _, t := range targets {
if err := t.Cmd.Wait(); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
log.Errorf("err: %v", strings.Join(errList, ", "))
done <- fmt.Errorf("err: %v", strings.Join(errList, ", "))
}
done <- nil
}()
// check the timeout for the command
// Note: timeout will occur when process didn't complete even after 10s of chaos duration
timeout := time.After((time.Duration(experimentsDetails.ChaosDuration) + 30) * time.Second)
select {
case <-timeout:
// the stress process gets timeout before completion
log.Infof("[Chaos] The stress process is not yet completed after the chaos duration of %vs", experimentsDetails.ChaosDuration+30)
log.Info("[Timeout]: Killing the stress process")
var errList []string
for _, t := range targets {
if err = terminateProcess(t); err != nil {
errList = append(errList, err.Error())
continue
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
case doneErr := <-done:
select {
case <-injectAbort:
// wait for the completion of abort handler
time.Sleep(10 * time.Second)
default:
log.Info("[Info]: Reverting Chaos")
var errList []string
for _, t := range targets {
if err := terminateProcess(t); err != nil {
errList = append(errList, err.Error())
continue
}
if err := result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return doneErr
}
}
return nil
}
func injectChaos(experimentsDetails *experimentTypes.ExperimentDetails, t targetDetails) (*exec.Cmd, error) {
// prepare dns interceptor
var out bytes.Buffer
commandTemplate := fmt.Sprintf("sudo TARGET_PID=%d CHAOS_TYPE=%s SPOOF_MAP='%s' TARGET_HOSTNAMES='%s' CHAOS_DURATION=%d MATCH_SCHEME=%s nsutil -p -n -t %d -- dns_interceptor", t.Pid, experimentsDetails.ChaosType, experimentsDetails.SpoofMap, experimentsDetails.TargetHostNames, experimentsDetails.ChaosDuration, experimentsDetails.MatchScheme, t.Pid)
cmd := exec.Command("/bin/bash", "-c", commandTemplate)
log.Info(cmd.String())
cmd.Stdout = &out
cmd.Stderr = &out
if err = cmd.Start(); err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: experimentsDetails.ChaosPodName, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("faild to inject chaos: %s", out.String())}
}
return cmd, nil
}
func terminateProcess(t targetDetails) error {
// kill command
killTemplate := fmt.Sprintf("sudo kill %d", t.Cmd.Process.Pid)
kill := exec.Command("/bin/bash", "-c", killTemplate)
var out bytes.Buffer
kill.Stderr = &out
kill.Stdout = &out
if err = kill.Run(); err != nil {
if strings.Contains(strings.ToLower(out.String()), ProcessAlreadyKilled) {
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("failed to revert chaos %s", out.String())}
} else {
log.Errorf("dns interceptor process stopped")
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
}
return nil
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targets []targetDetails, resultName, chaosNS string) {
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("[Abort]: Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
for _, t := range targets {
if err = terminateProcess(t); err != nil {
log.Errorf("unable to revert for %v pod, err :%v", t.Name, err)
continue
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult for %v pod, err :%v", t.Name, err)
}
}
retry--
time.Sleep(1 * time.Second)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "60"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.TargetHostNames = types.Getenv("TARGET_HOSTNAMES", "")
experimentDetails.SpoofMap = types.Getenv("SPOOF_MAP", "")
experimentDetails.MatchScheme = types.Getenv("MATCH_SCHEME", "exact")
experimentDetails.ChaosType = types.Getenv("CHAOS_TYPE", "error")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
ContainerId string
Pid int
CommandPid int
Cmd *exec.Cmd
Source string
}

View File

@ -1,253 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-dns-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodDNSFault")
defer span.End()
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode inject the DNS Chaos in all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDNSFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform DNS Chaos
for _, pod := range targetPodList.Items {
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode inject the DNS Chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDNSFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreatePodDNSFaultHelperPod")
defer span.End()
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
Volumes: []apiv1.Volume{
{
Name: "cri-socket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
},
Args: []string{
"-c",
"./helpers -name dns-chaos",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
MountPath: experimentsDetails.SocketPath,
},
},
SecurityContext: &apiv1.SecurityContext{
Privileged: &privilegedEnable,
},
},
},
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
SetEnv("CHAOS_UID", string(experimentsDetails.ChaosUID)).
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("TARGET_HOSTNAMES", experimentsDetails.TargetHostNames).
SetEnv("SPOOF_MAP", experimentsDetails.SpoofMap).
SetEnv("MATCH_SCHEME", experimentsDetails.MatchScheme).
SetEnv("CHAOS_TYPE", experimentsDetails.ChaosType).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
}

View File

@ -1,308 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-fio-stress/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
)
// PrepareChaos contains the chaos preparation and injection steps
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodFIOStressFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Fio stress experiment
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not inject chaos")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// stressStorage uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the storage utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
func stressStorage(experimentDetails *experimentTypes.ExperimentDetails, podName, ns string, clients clients.ClientSets, stressErr chan error) {
log.Infof("The storage consumption is: %vM", experimentDetails.Size)
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
fioCmd := fmt.Sprintf("fio --name=testchaos --ioengine=%v --iodepth=%v --rw=%v --bs=%v --size=%vM --numjobs=%v", experimentDetails.IOEngine, experimentDetails.IODepth, experimentDetails.ReadWrite, experimentDetails.BlockSize, experimentDetails.Size, experimentDetails.NumJobs)
if experimentDetails.GroupReporting {
fioCmd += " --group_reporting"
}
log.Infof("Running the command:\n%v", fioCmd)
command := []string{"/bin/sh", "-c", fioCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentDetails.TargetContainer, ns)
_, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err
}
// experimentExecution function orchestrates the experiment by calling the StressStorage function, of every container, of every pod that is targeted
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode stressed the storage of all target application in serial mode (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodFIOStressFaultInSerialMode")
defer span.End()
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
for _, pod := range targetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"Space Consumption(MB)": experimentsDetails.Size,
})
go stressStorage(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
loop:
for {
endTime = time.After(timeDelay)
select {
case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if resource to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed")
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("podName: %s, namespace: %s, container: %s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: fmt.Sprintf("failed to stress cpu of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressSerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
if err := killStressSerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
}
return nil
}
// injectChaosInParallelMode stressed the storage of all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodFIOStressFaultInParallelMode")
defer span.End()
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
for _, pod := range targetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"Storage Consumption(MB)": experimentsDetails.Size,
})
go stressStorage(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
}
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
loop:
for {
endTime = time.After(timeDelay)
select {
case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if resource to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed")
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to injcet chaos: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
break loop
}
}
if err := killStressParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients); err != nil {
return stacktrace.Propagate(err, "could revert chaos")
}
return nil
}
// killStressSerial function to kill a stress process running inside target container
//
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressSerial(containerName, podName, namespace, KillCmd string, clients clients.ClientSets) error {
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", KillCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
out, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, namespace), Reason: fmt.Sprintf("failed to revert chaos: %s", out)}
}
return nil
}
// killStressParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressParallel(containerName string, targetPodList corev1.PodList, KillCmd string, clients clients.ClientSets) error {
var errList []string
for _, pod := range targetPodList.Items {
if err := killStressSerial(containerName, pod.Name, pod.Namespace, KillCmd, clients); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}

View File

@ -1,334 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strconv"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-memory-hog-exec/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
)
var inject chan os.Signal
// PrepareMemoryExecStress contains the chaos preparation and injection steps
func PrepareMemoryExecStress(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodMemoryHogExecFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Memory stress experiment
if err := experimentMemory(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not stress memory")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// stressMemory Uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the Memory utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
func stressMemory(MemoryConsumption, containerName, podName, namespace string, clients clients.ClientSets, stressErr chan error) {
log.Infof("The memory consumption is: %v", MemoryConsumption)
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
ddCmd := fmt.Sprintf("dd if=/dev/zero of=/dev/null bs=" + MemoryConsumption + "M")
command := []string{"/bin/sh", "-c", ddCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err
}
// experimentMemory function orchestrates the experiment by calling the StressMemory function, of every container, of every pod that is targeted
func experimentMemory(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode stressed the memory of all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodMemoryHogExecFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
select {
case <-inject:
// stopping the chaos execution, if abort signal received
time.Sleep(10 * time.Second)
os.Exit(0)
default:
for _, pod := range targetPodList.Items {
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"Memory Consumption(MB)": experimentsDetails.MemoryConsumption,
})
go stressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, pod.Namespace, clients, stressErr)
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
loop:
for {
endTime = time.After(timeDelay)
select {
case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if memory to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed")
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("podName: %s, namespace: %s, container: %s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: fmt.Sprintf("failed to stress memory of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
// updating the chaosresult after stopped
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
if err := killStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not revert memory stress")
}
}
}
return nil
}
// injectChaosInParallelMode stressed the memory of all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodMemoryHogExecFaultInParallelMode")
defer span.End()
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
select {
case <-inject:
// stopping the chaos execution, if abort signal received
time.Sleep(10 * time.Second)
os.Exit(0)
default:
for _, pod := range targetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
//It checks the empty target container for the first iteration only
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"Memory Consumption(MB)": experimentsDetails.MemoryConsumption,
})
go stressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, pod.Namespace, clients, stressErr)
}
}
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
loop:
for {
endTime = time.After(timeDelay)
select {
case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if memory to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed")
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to stress memory of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
// updating the chaosresult after stopped
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
break loop
}
}
return killStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients, chaosDetails)
}
// killStressMemorySerial function to kill a stress process running inside target container
//
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressMemorySerial(containerName, podName, namespace, memFreeCmd string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// It will contains all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", memFreeCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
out, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, namespace), Reason: fmt.Sprintf("failed to revert chaos: %s", out)}
}
common.SetTargets(podName, "reverted", "pod", chaosDetails)
return nil
}
// killStressMemoryParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressMemoryParallel(containerName string, targetPodList corev1.PodList, memFreeCmd string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
var errList []string
for _, pod := range targetPodList.Items {
if err := killStressMemorySerial(containerName, pod.Name, pod.Namespace, memFreeCmd, clients, chaosDetails); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}

View File

@ -0,0 +1,296 @@
package lib
import (
"fmt"
"os"
"os/signal"
"strconv"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-memory-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/klog"
)
var err error
// StressMemory Uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the Memory utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
func StressMemory(MemoryConsumption, containerName, podName, namespace string, clients clients.ClientSets, stressErr chan error) {
log.Infof("The memory consumption is: %v", MemoryConsumption)
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
ddCmd := fmt.Sprintf("dd if=/dev/zero of=/dev/null bs=" + MemoryConsumption + "M")
command := []string{"/bin/sh", "-c", ddCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err
}
//ExperimentMemory function orchestrates the experiment by calling the StressMemory function, of every container, of every pod that is targeted
func ExperimentMemory(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.AppNS, experimentsDetails.TargetPod, experimentsDetails.AppLabel, experimentsDetails.PodsAffectedPerc, clients)
if err != nil {
return errors.Errorf("Unable to get the target pod list, err: %v", err)
}
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = GetTargetContainer(experimentsDetails, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("Unable to get the target container name, err: %v", err)
}
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
} else {
if err = InjectChaosInParallelMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
}
return nil
}
// InjectChaosInSerialMode stressed the memory of all target application serially (one by one)
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// creating err channel to recieve the error from the go routine
stressErr := make(chan error)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
for _, pod := range targetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"container": experimentsDetails.TargetContainer,
"Pod": pod.Name,
"Memory Consumption(MB)": experimentsDetails.MemoryConsumption,
})
go StressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, clients, stressErr)
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
loop:
for {
endTime = time.After(timeDelay)
select {
case err := <-stressErr:
// skipping the execution, if recieved any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if memory to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
return nil
}
return err
}
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
err = KillStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients)
if err != nil {
klog.V(0).Infof("Error in Kill stress after abortion")
return err
}
// updating the chaosresult after stopped
failStep := "Memory hog Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
// generating summary event in chaosengine
msg := experimentsDetails.ExperimentName + " experiment has been aborted"
types.SetEngineEventAttributes(eventsDetails, types.StoppedVerdict, msg, "Warning", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
// generating summary event in chaosresult
types.SetResultEventAttributes(eventsDetails, types.Summary, msg, "Warning", resultDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosResult")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
if err = KillStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil {
return err
}
}
return nil
}
// InjectChaosInParallelMode stressed the memory of all target application in parallel mode (all at once)
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// creating err channel to recieve the error from the go routine
stressErr := make(chan error)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
for _, pod := range targetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"container": experimentsDetails.TargetContainer,
"Pod": pod.Name,
"Memory Consumption(MB)": experimentsDetails.MemoryConsumption,
})
go StressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, clients, stressErr)
}
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
loop:
for {
endTime = time.After(timeDelay)
select {
case err := <-stressErr:
// skipping the execution, if recieved any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if memory to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
return nil
}
return err
}
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
err = KillStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients)
if err != nil {
klog.V(0).Infof("Error in Kill stress after abortion")
return err
}
// updating the chaosresult after stopped
failStep := "Memory hog Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
// generating summary event in chaosengine
msg := experimentsDetails.ExperimentName + " experiment has been aborted"
types.SetEngineEventAttributes(eventsDetails, types.StoppedVerdict, msg, "Warning", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
// generating summary event in chaosresult
types.SetResultEventAttributes(eventsDetails, types.Summary, msg, "Warning", resultDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosResult")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
break loop
}
}
if err = KillStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil {
return err
}
return nil
}
//PrepareMemoryStress contains the steps for prepration before chaos
func PrepareMemoryStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Memory stress experiment
err := ExperimentMemory(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails)
if err != nil {
return err
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
//GetTargetContainer will fetch the container name from application pod
// It will return the first container name from the application pod
func GetTargetContainer(experimentsDetails *experimentTypes.ExperimentDetails, appName string, clients clients.ClientSets) (string, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(appName, v1.GetOptions{})
if err != nil {
return "", err
}
return pod.Spec.Containers[0].Name, nil
}
// KillStressMemorySerial function to kill a stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func KillStressMemorySerial(containerName, podName, namespace, memFreeCmd string, clients clients.ClientSets) error {
// It will contains all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", memFreeCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return errors.Errorf("Unable to kill stress process inside target container, err: %v", err)
}
return nil
}
// KillStressMemoryParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func KillStressMemoryParallel(containerName string, targetPodList corev1.PodList, namespace, memFreeCmd string, clients clients.ClientSets) error {
for _, pod := range targetPodList.Items {
if err := KillStressMemorySerial(containerName, pod.Name, namespace, memFreeCmd, clients); err != nil {
return err
}
}
return nil
}

View File

@ -1,297 +0,0 @@
package lib
import (
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/palantir/stacktrace"
"strings"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-network-partition/types"
"gopkg.in/yaml.v2"
corev1 "k8s.io/api/core/v1"
networkv1 "k8s.io/api/networking/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/intstr"
)
const (
// AllIPs cidr contains all ips
AllIPs string = "0.0.0.0/0"
)
// NetworkPolicy contains details about the network-policy
type NetworkPolicy struct {
TargetPodLabels map[string]string
PolicyType []networkv1.PolicyType
Egress []networkv1.NetworkPolicyEgressRule
Ingress []networkv1.NetworkPolicyIngressRule
ExceptIPs []string
NamespaceSelector map[string]string
PodSelector map[string]string
Ports []networkv1.NetworkPolicyPort
}
// Port contains the port details
type Port struct {
TCP []int32 `json:"tcp"`
UDP []int32 `json:"udp"`
SCTP []int32 `json:"sctp"`
}
// initialize creates an instance of network policy struct
func initialize() *NetworkPolicy {
return &NetworkPolicy{}
}
// getNetworkPolicyDetails collects all the data required for network policy
func (np *NetworkPolicy) getNetworkPolicyDetails(experimentsDetails *experimentTypes.ExperimentDetails) error {
np.setLabels(experimentsDetails.AppLabel).
setPolicy(experimentsDetails.PolicyTypes).
setPodSelector(experimentsDetails.PodSelector).
setNamespaceSelector(experimentsDetails.NamespaceSelector)
// sets the ports for the traffic control
if err := np.setPort(experimentsDetails.PORTS); err != nil {
return stacktrace.Propagate(err, "could not set port")
}
// sets the destination ips for which the traffic should be blocked
if err := np.setExceptIPs(experimentsDetails); err != nil {
return stacktrace.Propagate(err, "could not set ips")
}
// sets the egress traffic rules
if strings.ToLower(experimentsDetails.PolicyTypes) == "egress" || strings.ToLower(experimentsDetails.PolicyTypes) == "all" {
np.setEgressRules()
}
// sets the ingress traffic rules
if strings.ToLower(experimentsDetails.PolicyTypes) == "ingress" || strings.ToLower(experimentsDetails.PolicyTypes) == "all" {
np.setIngressRules()
}
return nil
}
// setLabels sets the target application label
func (np *NetworkPolicy) setLabels(appLabel string) *NetworkPolicy {
key, value := getKeyValue(appLabel)
if key != "" || value != "" {
np.TargetPodLabels = map[string]string{
key: value,
}
}
return np
}
// getKeyValue returns the key & value from the label
func getKeyValue(label string) (string, string) {
labels := strings.Split(label, "=")
switch {
case len(labels) == 2:
return labels[0], labels[1]
default:
return labels[0], ""
}
}
// setPolicy sets the network policy types
func (np *NetworkPolicy) setPolicy(policy string) *NetworkPolicy {
switch strings.ToLower(policy) {
case "ingress":
np.PolicyType = []networkv1.PolicyType{networkv1.PolicyTypeIngress}
case "egress":
np.PolicyType = []networkv1.PolicyType{networkv1.PolicyTypeEgress}
default:
np.PolicyType = []networkv1.PolicyType{networkv1.PolicyTypeEgress, networkv1.PolicyTypeIngress}
}
return np
}
// setPodSelector sets the pod labels selector
func (np *NetworkPolicy) setPodSelector(podLabel string) *NetworkPolicy {
podSelector := map[string]string{}
labels := strings.Split(podLabel, ",")
for i := range labels {
key, value := getKeyValue(labels[i])
if key != "" || value != "" {
podSelector[key] = value
}
}
np.PodSelector = podSelector
return np
}
// setNamespaceSelector sets the namespace labels selector
func (np *NetworkPolicy) setNamespaceSelector(nsLabel string) *NetworkPolicy {
nsSelector := map[string]string{}
labels := strings.Split(nsLabel, ",")
for i := range labels {
key, value := getKeyValue(labels[i])
if key != "" || value != "" {
nsSelector[key] = value
}
}
np.NamespaceSelector = nsSelector
return np
}
// setPort sets all the protocols and ports
func (np *NetworkPolicy) setPort(p string) error {
var ports []networkv1.NetworkPolicyPort
var port Port
// unmarshal the protocols and ports from the env
if err := yaml.Unmarshal([]byte(strings.TrimSpace(parseCommand(p))), &port); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("failed to unmarshal ports: %s", err.Error())}
}
// sets all the tcp ports
for _, p := range port.TCP {
ports = append(ports, getPort(p, corev1.ProtocolTCP))
}
// sets all the udp ports
for _, p := range port.UDP {
ports = append(ports, getPort(p, corev1.ProtocolUDP))
}
// sets all the sctp ports
for _, p := range port.SCTP {
ports = append(ports, getPort(p, corev1.ProtocolSCTP))
}
np.Ports = ports
return nil
}
// getPort return the port details
func getPort(port int32, protocol corev1.Protocol) networkv1.NetworkPolicyPort {
networkPorts := networkv1.NetworkPolicyPort{
Protocol: &protocol,
Port: &intstr.IntOrString{
Type: intstr.Int,
IntVal: port,
},
}
return networkPorts
}
// setExceptIPs sets all the destination ips
// for which traffic should be blocked
func (np *NetworkPolicy) setExceptIPs(experimentsDetails *experimentTypes.ExperimentDetails) error {
// get all the target ips
destinationIPs, err := network_chaos.GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients.ClientSets{}, false)
if err != nil {
return stacktrace.Propagate(err, "could not get destination ips")
}
ips := strings.Split(destinationIPs, ",")
var uniqueIps []string
// removing all the duplicates and ipv6 ips from the list, if any
for i := range ips {
isPresent := false
for j := range uniqueIps {
if ips[i] == uniqueIps[j] {
isPresent = true
}
}
if ips[i] != "" && !isPresent && !strings.Contains(ips[i], ":") {
uniqueIps = append(uniqueIps, ips[i]+"/32")
}
}
np.ExceptIPs = uniqueIps
return nil
}
// setIngressRules sets the ingress traffic rules
func (np *NetworkPolicy) setIngressRules() *NetworkPolicy {
if len(np.getPeers()) != 0 || len(np.Ports) != 0 {
np.Ingress = []networkv1.NetworkPolicyIngressRule{
{
From: np.getPeers(),
Ports: np.Ports,
},
}
}
return np
}
// setEgressRules sets the egress traffic rules
func (np *NetworkPolicy) setEgressRules() *NetworkPolicy {
if len(np.getPeers()) != 0 || len(np.Ports) != 0 {
np.Egress = []networkv1.NetworkPolicyEgressRule{
{
To: np.getPeers(),
Ports: np.Ports,
},
}
}
return np
}
// getPeers return the peer's ips, namespace selectors, and pod selectors
func (np *NetworkPolicy) getPeers() []networkv1.NetworkPolicyPeer {
var peers []networkv1.NetworkPolicyPeer
// sets the namespace selectors
if np.NamespaceSelector != nil && len(np.NamespaceSelector) != 0 {
peers = append(peers, np.getNamespaceSelector())
}
// sets the pod selectors
if np.PodSelector != nil && len(np.PodSelector) != 0 {
peers = append(peers, np.getPodSelector())
}
// sets the ipblocks
if np.ExceptIPs != nil && len(np.ExceptIPs) != 0 {
peers = append(peers, np.getIPBlocks())
}
return peers
}
// getNamespaceSelector builds the namespace selector
func (np *NetworkPolicy) getNamespaceSelector() networkv1.NetworkPolicyPeer {
nsSelector := networkv1.NetworkPolicyPeer{
NamespaceSelector: &v1.LabelSelector{
MatchLabels: np.NamespaceSelector,
},
}
return nsSelector
}
// getPodSelector builds the pod selectors
func (np *NetworkPolicy) getPodSelector() networkv1.NetworkPolicyPeer {
podSelector := networkv1.NetworkPolicyPeer{
PodSelector: &v1.LabelSelector{
MatchLabels: np.PodSelector,
},
}
return podSelector
}
// getIPBlocks builds the ipblocks
func (np *NetworkPolicy) getIPBlocks() networkv1.NetworkPolicyPeer {
ipBlocks := networkv1.NetworkPolicyPeer{
IPBlock: &networkv1.IPBlock{
CIDR: AllIPs,
Except: np.ExceptIPs,
},
}
return ipBlocks
}
// parseCommand parse the protocols and ports
func parseCommand(command string) string {
final := ""
c := strings.Split(command, ", ")
for i := range c {
final = final + strings.TrimSpace(c[i]) + "\n"
}
return final
}

View File

@ -1,260 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-network-partition/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
networkv1 "k8s.io/api/networking/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var (
inject, abort chan os.Signal
)
// PrepareAndInjectChaos contains the prepration & injection steps
func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkPartitionFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// validate the appLabels
if chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide the appLabel"}
}
// Get the target pod details for the chaos execution
targetPodList, err := common.GetPodList("", 100, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
// generate a unique string
runID := stringutils.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// collect all the data for the network policy
np := initialize()
if err := np.getNetworkPolicyDetails(experimentsDetails); err != nil {
return stacktrace.Propagate(err, "could not get network policy details")
}
//DISPLAY THE NETWORK POLICY DETAILS
log.InfoWithValues("The Network policy details are as follows", logrus.Fields{
"Target Label": np.TargetPodLabels,
"Policy Type": np.PolicyType,
"PodSelector": np.PodSelector,
"NamespaceSelector": np.NamespaceSelector,
"Destination IPs": np.ExceptIPs,
"Ports": np.Ports,
})
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, clients, chaosDetails, resultDetails, &targetPodList, runID)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// creating the network policy to block the traffic
if err := createNetworkPolicy(ctx, experimentsDetails, clients, np, runID); err != nil {
return stacktrace.Propagate(err, "could not create network policy")
}
// updating chaos status to injected for the target pods
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
}
}
// verify the presence of network policy inside cluster
if err := checkExistenceOfPolicy(experimentsDetails, clients, experimentsDetails.Timeout, experimentsDetails.Delay, runID); err != nil {
return stacktrace.Propagate(err, "could not check existence of network policy")
}
log.Infof("[Wait]: Wait for %v chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
// deleting the network policy after chaos duration over
if err := deleteNetworkPolicy(experimentsDetails, clients, &targetPodList, chaosDetails, experimentsDetails.Timeout, experimentsDetails.Delay, runID); err != nil {
return stacktrace.Propagate(err, "could not delete network policy")
}
// updating chaos status to reverted for the target pods
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// createNetworkPolicy creates the network policy in the application namespace
// it blocks ingress/egress traffic for the targeted application for specific/all IPs
func createNetworkPolicy(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, networkPolicy *NetworkPolicy, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodNetworkPartitionFault")
defer span.End()
np := &networkv1.NetworkPolicy{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-np-" + runID,
Namespace: experimentsDetails.AppNS,
Labels: map[string]string{
"name": experimentsDetails.ExperimentName + "-np-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
},
},
Spec: networkv1.NetworkPolicySpec{
PodSelector: v1.LabelSelector{
MatchLabels: networkPolicy.TargetPodLabels,
},
PolicyTypes: networkPolicy.PolicyType,
Egress: networkPolicy.Egress,
Ingress: networkPolicy.Ingress,
},
}
_, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).Create(context.Background(), np, v1.CreateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to create network policy: %s", err.Error())}
}
return nil
}
// deleteNetworkPolicy deletes the network policy and wait until the network policy deleted completely
func deleteNetworkPolicy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, targetPodList *corev1.PodList, chaosDetails *types.ChaosDetails, timeout, delay int, runID string) error {
name := experimentsDetails.ExperimentName + "-np-" + runID
labels := "name=" + experimentsDetails.ExperimentName + "-np-" + runID
if err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).Delete(context.Background(), name, v1.DeleteOptions{}); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{name: %s, namespace: %s}", name, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to delete network policy: %s", err.Error())}
}
err := retry.
Times(uint(timeout / delay)).
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
npList, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).List(context.Background(), v1.ListOptions{LabelSelector: labels})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to list network policies: %s", err.Error())}
} else if len(npList.Items) != 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: "network policies are not deleted within timeout"}
}
return nil
})
if err != nil {
return err
}
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
return nil
}
// checkExistenceOfPolicy validate the presence of network policy inside the application namespace
func checkExistenceOfPolicy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, timeout, delay int, runID string) error {
labels := "name=" + experimentsDetails.ExperimentName + "-np-" + runID
return retry.
Times(uint(timeout / delay)).
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
npList, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).List(context.Background(), v1.ListOptions{LabelSelector: labels})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to list network policies: %s", err.Error())}
} else if len(npList.Items) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: "no network policy found with matching labels"}
}
return nil
})
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, targetPodList *corev1.PodList, runID string) {
// waiting till the abort signal received
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
if err := checkExistenceOfPolicy(experimentsDetails, clients, 2, 1, runID); err != nil {
if error, ok := err.(cerrors.Error); ok {
if strings.Contains(error.Reason, "no network policy found with matching labels") {
break
}
}
log.Infof("no active network policy found, err: %v", err.Error())
retry--
continue
}
if err := deleteNetworkPolicy(experimentsDetails, clients, targetPodList, chaosDetails, 2, 1, runID); err != nil {
log.Errorf("unable to delete network policy, err: %v", err)
}
retry--
}
// updating the chaosresult after stopped
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
log.Info("Chaos Revert Completed")
os.Exit(0)
}

View File

@ -1,260 +0,0 @@
package lib
import (
"fmt"
"go.opentelemetry.io/otel"
"golang.org/x/net/context"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/rds"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/rds-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
)
var (
err error
inject, abort chan os.Signal
)
func PrepareRDSInstanceStop(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareRDSInstanceStop")
defer span.End()
// Inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// Abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Get the instance identifier or list of instance identifiers
instanceIdentifierList := strings.Split(experimentsDetails.RDSInstanceIdentifier, ",")
if experimentsDetails.RDSInstanceIdentifier == "" || len(instanceIdentifierList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no RDS instance identifier found to stop"}
}
instanceIdentifierList = common.FilterBasedOnPercentage(experimentsDetails.InstanceAffectedPerc, instanceIdentifierList)
log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceIdentifierList))
// Watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, instanceIdentifierList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceIdentifierList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceIdentifierList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the rds instance state in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
select {
case <-inject:
// Stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instance identifier list, %v", instanceIdentifierList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on rds instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i, identifier := range instanceIdentifierList {
// Stopping the RDS instance
log.Info("[Chaos]: Stopping the desired RDS instance")
if err := awslib.RDSInstanceStop(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "injected", "RDS", chaosDetails)
// Wait for rds instance to completely stop
log.Infof("[Wait]: Wait for RDS instance '%v' to get in stopped state", identifier)
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
// Run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
// Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
// Starting the RDS instance
log.Info("[Chaos]: Starting back the RDS instance")
if err = awslib.RDSInstanceStart(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
// Wait for rds instance to get in available state
log.Infof("[Wait]: Wait for RDS instance '%v' to get in available state", identifier)
if err := awslib.WaitForRDSInstanceUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode will inject the rds instance termination in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instance identifier list, %v", instanceIdentifierList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on rds instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// PowerOff the instance
for _, identifier := range instanceIdentifierList {
// Stopping the RDS instance
log.Info("[Chaos]: Stopping the desired RDS instance")
if err := awslib.RDSInstanceStop(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "injected", "RDS", chaosDetails)
}
for _, identifier := range instanceIdentifierList {
// Wait for rds instance to completely stop
log.Infof("[Wait]: Wait for RDS instance '%v' to get in stopped state", identifier)
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
// Run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
// Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
// Starting the RDS instance
for _, identifier := range instanceIdentifierList {
log.Info("[Chaos]: Starting back the RDS instance")
if err = awslib.RDSInstanceStart(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
}
for _, identifier := range instanceIdentifierList {
// Wait for rds instance to get in available state
log.Infof("[Wait]: Wait for RDS instance '%v' to get in available state", identifier)
if err := awslib.WaitForRDSInstanceUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
}
for _, identifier := range instanceIdentifierList {
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// watching for the abort signal and revert the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for _, identifier := range instanceIdentifierList {
instanceState, err := awslib.GetRDSInstanceStatus(identifier, experimentsDetails.Region)
if err != nil {
log.Errorf("Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "running" {
log.Info("[Abort]: Waiting for the RDS instance to get down")
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
log.Errorf("Unable to wait till stop of the instance: %v", err)
}
log.Info("[Abort]: Starting RDS instance as abort signal received")
err := awslib.RDSInstanceStart(identifier, experimentsDetails.Region)
if err != nil {
log.Errorf("RDS instance failed to start when an abort signal is received: %v", err)
}
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -1,76 +0,0 @@
package lib
import (
"context"
"fmt"
"time"
redfishLib "github.com/litmuschaos/litmus-go/pkg/baremetal/redfish"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/baremetal/redfish-node-restart/types"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
// injectChaos initiates node restart chaos on the target node
func injectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectRedfishNodeRestartFault")
defer span.End()
URL := fmt.Sprintf("https://%v/redfish/v1/Systems/System.Embedded.1/Actions/ComputerSystem.Reset", experimentsDetails.IPMIIP)
return redfishLib.RebootNode(URL, experimentsDetails.User, experimentsDetails.Password)
}
// experimentExecution function orchestrates the experiment by calling the injectChaos function
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + experimentsDetails.IPMIIP + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if err := injectChaos(ctx, experimentsDetails, clients); err != nil {
return stacktrace.Propagate(err, "chaos injection failed")
}
log.Infof("[Chaos]: Waiting for: %vs", experimentsDetails.ChaosDuration)
time.Sleep(time.Duration(experimentsDetails.ChaosDuration) * time.Second)
return nil
}
// PrepareChaos contains the chaos prepration and injection steps
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareRedfishNodeRestartFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Redfish node restart experiment
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
common.SetTargets(experimentsDetails.IPMIIP, "targeted", "node", chaosDetails)
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}

View File

@ -1,403 +0,0 @@
package lib
import (
"bytes"
"context"
"encoding/json"
"fmt"
"net/http"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
corev1 "k8s.io/api/core/v1"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/spring-boot/spring-boot-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/sirupsen/logrus"
)
var revertAssault = experimentTypes.ChaosMonkeyAssaultRevert{
LatencyActive: false,
KillApplicationActive: false,
CPUActive: false,
MemoryActive: false,
ExceptionsActive: false,
}
// SetTargetPodList selects the targeted pod and add them to the experimentDetails
func SetTargetPodList(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
var err error
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or TARGET_PODS"}
}
if experimentsDetails.TargetPodList, err = common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails); err != nil {
return err
}
return nil
}
// PrepareChaos contains the preparation steps before chaos injection
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareSpringBootFault")
defer span.End()
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
log.InfoWithValues("[Info]: Chaos monkeys watchers will be injected to the target pods as follows", logrus.Fields{
"WebClient": experimentsDetails.ChaosMonkeyWatchers.WebClient,
"Service": experimentsDetails.ChaosMonkeyWatchers.Service,
"Component": experimentsDetails.ChaosMonkeyWatchers.Component,
"Repository": experimentsDetails.ChaosMonkeyWatchers.Repository,
"Controller": experimentsDetails.ChaosMonkeyWatchers.Controller,
"RestController": experimentsDetails.ChaosMonkeyWatchers.RestController,
})
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// CheckChaosMonkey verifies if chaos monkey for spring boot is available in the selected pods
// All pods are checked, even if some errors occur. But in case of one pod in error, the check will be in error
func CheckChaosMonkey(chaosMonkeyPort string, chaosMonkeyPath string, targetPods corev1.PodList) (bool, error) {
hasErrors := false
targetPodNames := []string{}
for _, pod := range targetPods.Items {
targetPodNames = append(targetPodNames, pod.Name)
endpoint := "http://" + pod.Status.PodIP + ":" + chaosMonkeyPort + chaosMonkeyPath
log.Infof("[Check]: Checking pod: %v (endpoint: %v)", pod.Name, endpoint)
resp, err := http.Get(endpoint)
if err != nil {
log.Errorf("failed to request chaos monkey endpoint on pod %s, %s", pod.Name, err.Error())
hasErrors = true
continue
}
if resp.StatusCode != 200 {
log.Errorf("failed to get chaos monkey endpoint on pod %s (status: %d)", pod.Name, resp.StatusCode)
hasErrors = true
}
}
if hasErrors {
return false, cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{podNames: %s}", targetPodNames), Reason: "failed to check chaos monkey on at least one pod, check logs for details"}
}
return true, nil
}
// enableChaosMonkey enables chaos monkey on selected pods
func enableChaosMonkey(chaosMonkeyPort string, chaosMonkeyPath string, pod corev1.Pod) error {
log.Infof("[Chaos]: Enabling Chaos Monkey on pod: %v", pod.Name)
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/enable", "", nil) //nolint:bodyclose
if err != nil {
return err
}
if resp.StatusCode != 200 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to enable chaos monkey endpoint (status: %d)", resp.StatusCode)}
}
return nil
}
func setChaosMonkeyWatchers(chaosMonkeyPort string, chaosMonkeyPath string, watchers experimentTypes.ChaosMonkeyWatchers, pod corev1.Pod) error {
log.Infof("[Chaos]: Setting Chaos Monkey watchers on pod: %v", pod.Name)
jsonValue, err := json.Marshal(watchers)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to marshal chaos monkey watchers, %s", err.Error())}
}
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/watchers", "application/json", bytes.NewBuffer(jsonValue))
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to set watchers, %s", err.Error())}
}
if resp.StatusCode != 200 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to set assault (status: %d)", resp.StatusCode)}
}
return nil
}
func startAssault(chaosMonkeyPort string, chaosMonkeyPath string, assault []byte, pod corev1.Pod) error {
if err := setChaosMonkeyAssault(chaosMonkeyPort, chaosMonkeyPath, assault, pod); err != nil {
return err
}
log.Infof("[Chaos]: Activating Chaos Monkey assault on pod: %v", pod.Name)
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/assaults/runtime/attack", "", nil)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to start assault %s", err.Error())}
}
if resp.StatusCode != 200 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to activate runtime attack (status: %d)", resp.StatusCode)}
}
return nil
}
func setChaosMonkeyAssault(chaosMonkeyPort string, chaosMonkeyPath string, assault []byte, pod corev1.Pod) error {
log.Infof("[Chaos]: Setting Chaos Monkey assault on pod: %v", pod.Name)
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/assaults", "application/json", bytes.NewBuffer(assault))
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to set assault, %s", err.Error())}
}
if resp.StatusCode != 200 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to set assault (status: %d)", resp.StatusCode)}
}
return nil
}
// disableChaosMonkey disables chaos monkey on selected pods
func disableChaosMonkey(ctx context.Context, chaosMonkeyPort string, chaosMonkeyPath string, pod corev1.Pod) error {
log.Infof("[Chaos]: disabling assaults on pod %s", pod.Name)
jsonValue, err := json.Marshal(revertAssault)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to marshal chaos monkey revert-chaos watchers, %s", err.Error())}
}
if err := setChaosMonkeyAssault(chaosMonkeyPort, chaosMonkeyPath, jsonValue, pod); err != nil {
return err
}
log.Infof("[Chaos]: disabling chaos monkey on pod %s", pod.Name)
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/disable", "", nil)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to disable assault, %s", err.Error())}
}
if resp.StatusCode != 200 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to disable chaos monkey endpoint (status: %d)", resp.StatusCode)}
}
return nil
}
// injectChaosInSerialMode injects chaos monkey assault on pods in serial mode(one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectSpringBootFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
select {
case <-signChan:
// stopping the chaos execution, if abort signal received
time.Sleep(10 * time.Second)
os.Exit(0)
default:
for _, pod := range experimentsDetails.TargetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
_ = events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Chaos]: Injecting on target pod", logrus.Fields{
"Target Pod": pod.Name,
})
if err := setChaosMonkeyWatchers(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyWatchers, pod); err != nil {
log.Errorf("[Chaos]: Failed to set watchers, err: %v ", err)
return err
}
if err := startAssault(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyAssault, pod); err != nil {
log.Errorf("[Chaos]: Failed to set assault, err: %v ", err)
return err
}
if err := enableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("[Chaos]: Failed to enable chaos, err: %v ", err)
return err
}
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
log.Infof("[Chaos]: Waiting for: %vs", experimentsDetails.ChaosDuration)
endTime = time.After(timeDelay)
loop:
for {
select {
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("Error in disabling chaos monkey, err: %v", err)
} else {
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, cerrors.ErrorTypeExperimentAborted)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
return err
}
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
}
return nil
}
// injectChaosInParallelMode injects chaos monkey assault on pods in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectSpringBootFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
select {
case <-signChan:
// stopping the chaos execution, if abort signal received
time.Sleep(10 * time.Second)
os.Exit(0)
default:
for _, pod := range experimentsDetails.TargetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
_ = events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Pod": pod.Name,
})
if err := setChaosMonkeyWatchers(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyWatchers, pod); err != nil {
log.Errorf("[Chaos]: Failed to set watchers, err: %v", err)
return err
}
if err := startAssault(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyAssault, pod); err != nil {
log.Errorf("[Chaos]: Failed to set assault, err: %v", err)
return err
}
if err := enableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("[Chaos]: Failed to enable chaos, err: %v", err)
return err
}
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
}
log.Infof("[Chaos]: Waiting for: %vs", experimentsDetails.ChaosDuration)
}
loop:
for {
endTime = time.After(timeDelay)
select {
case <-signChan:
log.Info("[Chaos]: Revert Started")
for _, pod := range experimentsDetails.TargetPodList.Items {
if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("Error in disabling chaos monkey, err: %v", err)
} else {
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
}
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, cerrors.ErrorTypeExperimentAborted)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
var errorList []string
for _, pod := range experimentsDetails.TargetPodList.Items {
if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
errorList = append(errorList, err.Error())
continue
}
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
if len(errorList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("error in disabling chaos monkey, [%s]", strings.Join(errorList, ","))}
}
return nil
}

View File

@ -1,697 +0,0 @@
package helper
import (
"bufio"
"bytes"
"context"
"fmt"
"io"
"os"
"os/exec"
"os/signal"
"path/filepath"
"strconv"
"strings"
"syscall"
"time"
"github.com/containerd/cgroups"
cgroupsv2 "github.com/containerd/cgroups/v2"
"github.com/palantir/stacktrace"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientTypes "k8s.io/apimachinery/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
)
// list of cgroups in a container
var (
cgroupSubsystemList = []string{"cpu", "memory", "systemd", "net_cls",
"net_prio", "freezer", "blkio", "perf_event", "devices", "cpuset",
"cpuacct", "pids", "hugetlb",
}
)
var (
err error
inject, abort chan os.Signal
)
const (
// ProcessAlreadyFinished contains error code when process is finished
ProcessAlreadyFinished = "os: process already finished"
// ProcessAlreadyKilled contains error code when process is already killed
ProcessAlreadyKilled = "no such process"
)
// Helper injects the stress chaos
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodStressFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
resultDetails := types.ResultDetails{}
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Fetching all the ENV passed for the helper pod
log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails)
// Intialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
if err := prepareStressChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
// prepareStressChaos contains the chaos preparation and injection steps
func prepareStressChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
// get stressors in list format
stressorList := prepareStressor(experimentsDetails)
if len(stressorList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: "fail to prepare stressors"}
}
stressors := strings.Join(stressorList, " ")
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not parse targets")
}
var targets []*targetDetails
for _, t := range targetList.Target {
td := &targetDetails{
Name: t.Name,
Namespace: t.Namespace,
Source: chaosDetails.ChaosPodName,
}
td.TargetContainers, err = common.GetTargetContainers(t.Name, t.Namespace, t.TargetContainer, chaosDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get target containers")
}
td.ContainerIds, err = common.GetContainerIDs(td.Namespace, td.Name, td.TargetContainers, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container ids")
}
for _, cid := range td.ContainerIds {
// extract out the pid of the target container
pid, err := common.GetPID(experimentsDetails.ContainerRuntime, cid, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
}
td.Pids = append(td.Pids, pid)
}
for i := range td.Pids {
cGroupManagers, err, grpPath := getCGroupManager(td, i)
if err != nil {
return stacktrace.Propagate(err, "could not get cgroup manager")
}
td.GroupPath = grpPath
td.CGroupManagers = append(td.CGroupManagers, cGroupManagers)
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": td.Name,
"Namespace": td.Namespace,
"TargetContainers": td.TargetContainers,
})
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos if an abort signal is received
go abortWatcher(targets, resultDetails.Name, chaosDetails.ChaosNamespace)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
done := make(chan error, 1)
for index, t := range targets {
for i := range t.Pids {
cmd, err := injectChaos(t, stressors, i, experimentsDetails.StressType)
if err != nil {
if revertErr := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, index-1); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not inject chaos")
}
targets[index].Cmds = append(targets[index].Cmds, cmd)
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainers[i])
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, index); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.Info("[Wait]: Waiting for chaos completion")
// channel to check the completion of the stress process
go func() {
var errList []string
var exitErr error
for _, t := range targets {
for i := range t.Cmds {
if err := t.Cmds[i].Cmd.Wait(); err != nil {
log.Infof("stress process failed, err: %v, out: %v", err, t.Cmds[i].Buffer.String())
if _, ok := err.(*exec.ExitError); ok {
exitErr = err
continue
}
errList = append(errList, err.Error())
}
}
}
if exitErr != nil {
oomKilled, err := checkOOMKilled(targets, clients, exitErr)
if err != nil {
log.Infof("could not check oomkilled, err: %v", err)
}
if !oomKilled {
done <- exitErr
}
done <- nil
} else if len(errList) != 0 {
oomKilled, err := checkOOMKilled(targets, clients, fmt.Errorf("err: %v", strings.Join(errList, ", ")))
if err != nil {
log.Infof("could not check oomkilled, err: %v", err)
}
if !oomKilled {
done <- fmt.Errorf("err: %v", strings.Join(errList, ", "))
}
done <- nil
} else {
done <- nil
}
}()
// check the timeout for the command
// Note: timeout will occur when process didn't complete even after 10s of chaos duration
timeout := time.After((time.Duration(experimentsDetails.ChaosDuration) + 30) * time.Second)
select {
case <-timeout:
// the stress process gets timeout before completion
log.Infof("[Chaos] The stress process is not yet completed after the chaos duration of %vs", experimentsDetails.ChaosDuration+30)
log.Info("[Timeout]: Killing the stress process")
if err := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
case err := <-done:
if err != nil {
exitErr, ok := err.(*exec.ExitError)
if ok {
status := exitErr.Sys().(syscall.WaitStatus)
if status.Signaled() {
log.Infof("process stopped with signal: %v", status.Signal())
}
if status.Signaled() && status.Signal() == syscall.SIGKILL {
// wait for the completion of abort handler
time.Sleep(10 * time.Second)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Source: chaosDetails.ChaosPodName, Reason: fmt.Sprintf("process stopped with SIGTERM signal")}
}
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: chaosDetails.ChaosPodName, Reason: err.Error()}
}
log.Info("[Info]: Reverting Chaos")
if err := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
}
return nil
}
func revertChaosForAllTargets(targets []*targetDetails, resultDetails *types.ResultDetails, chaosNs string, index int) error {
var errList []string
for i := 0; i <= index; i++ {
if err := terminateProcess(targets[i]); err != nil {
errList = append(errList, err.Error())
continue
}
if err := result.AnnotateChaosResult(resultDetails.Name, chaosNs, "reverted", "pod", targets[i].Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// checkOOMKilled checks if the container within the target pods failed due to an OOMKilled error.
func checkOOMKilled(targets []*targetDetails, clients clients.ClientSets, chaosError error) (bool, error) {
// Check each container in the pod
for i := 0; i < 3; i++ {
for _, t := range targets {
// Fetch the target pod
targetPod, err := clients.KubeClient.CoreV1().Pods(t.Namespace).Get(context.Background(), t.Name, v1.GetOptions{})
if err != nil {
return false, cerrors.Error{
ErrorCode: cerrors.ErrorTypeStatusChecks,
Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace),
Reason: err.Error(),
}
}
for _, c := range targetPod.Status.ContainerStatuses {
if utils.Contains(c.Name, t.TargetContainers) {
// Check for OOMKilled and restart
if c.LastTerminationState.Terminated != nil && c.LastTerminationState.Terminated.ExitCode == 137 {
log.Warnf("[Warning]: The target container '%s' of pod '%s' got OOM Killed, err: %v", c.Name, t.Name, chaosError)
return true, nil
}
}
}
}
time.Sleep(1 * time.Second)
}
return false, nil
}
// terminateProcess will remove the stress process from the target container after chaos completion
func terminateProcess(t *targetDetails) error {
var errList []string
for i := range t.Cmds {
if t.Cmds[i] != nil && t.Cmds[i].Cmd.Process != nil {
if err := syscall.Kill(-t.Cmds[i].Cmd.Process.Pid, syscall.SIGKILL); err != nil {
if strings.Contains(err.Error(), ProcessAlreadyKilled) || strings.Contains(err.Error(), ProcessAlreadyFinished) {
continue
}
errList = append(errList, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[i]), Reason: fmt.Sprintf("failed to revert chaos: %s", err.Error())}.Error())
continue
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainers[i])
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// prepareStressor will set the required stressors for the given experiment
func prepareStressor(experimentDetails *experimentTypes.ExperimentDetails) []string {
stressArgs := []string{
"stress-ng",
"--timeout",
strconv.Itoa(experimentDetails.ChaosDuration) + "s",
}
switch experimentDetails.StressType {
case "pod-cpu-stress":
log.InfoWithValues("[Info]: Details of Stressor:", logrus.Fields{
"CPU Core": experimentDetails.CPUcores,
"CPU Load": experimentDetails.CPULoad,
"Timeout": experimentDetails.ChaosDuration,
})
stressArgs = append(stressArgs, "--cpu "+experimentDetails.CPUcores)
stressArgs = append(stressArgs, " --cpu-load "+experimentDetails.CPULoad)
case "pod-memory-stress":
log.InfoWithValues("[Info]: Details of Stressor:", logrus.Fields{
"Number of Workers": experimentDetails.NumberOfWorkers,
"Memory Consumption": experimentDetails.MemoryConsumption,
"Timeout": experimentDetails.ChaosDuration,
})
stressArgs = append(stressArgs, "--vm "+experimentDetails.NumberOfWorkers+" --vm-bytes "+experimentDetails.MemoryConsumption+"M")
case "pod-io-stress":
var hddbytes string
if experimentDetails.FilesystemUtilizationBytes == "0" {
if experimentDetails.FilesystemUtilizationPercentage == "0" {
hddbytes = "10%"
log.Info("Neither of FilesystemUtilizationPercentage or FilesystemUtilizationBytes provided, proceeding with a default FilesystemUtilizationPercentage value of 10%")
} else {
hddbytes = experimentDetails.FilesystemUtilizationPercentage + "%"
}
} else {
if experimentDetails.FilesystemUtilizationPercentage == "0" {
hddbytes = experimentDetails.FilesystemUtilizationBytes + "G"
} else {
hddbytes = experimentDetails.FilesystemUtilizationPercentage + "%"
log.Warn("Both FsUtilPercentage & FsUtilBytes provided as inputs, using the FsUtilPercentage value to proceed with stress exp")
}
}
log.InfoWithValues("[Info]: Details of Stressor:", logrus.Fields{
"io": experimentDetails.NumberOfWorkers,
"hdd": experimentDetails.NumberOfWorkers,
"hdd-bytes": hddbytes,
"Timeout": experimentDetails.ChaosDuration,
"Volume Mount Path": experimentDetails.VolumeMountPath,
})
if experimentDetails.VolumeMountPath == "" {
stressArgs = append(stressArgs, "--io "+experimentDetails.NumberOfWorkers+" --hdd "+experimentDetails.NumberOfWorkers+" --hdd-bytes "+hddbytes)
} else {
stressArgs = append(stressArgs, "--io "+experimentDetails.NumberOfWorkers+" --hdd "+experimentDetails.NumberOfWorkers+" --hdd-bytes "+hddbytes+" --temp-path "+experimentDetails.VolumeMountPath)
}
if experimentDetails.CPUcores != "0" {
stressArgs = append(stressArgs, "--cpu %v", experimentDetails.CPUcores)
}
default:
log.Fatalf("stressor for %v experiment is not supported", experimentDetails.ExperimentName)
}
return stressArgs
}
// pidPath will get the pid path of the container
func pidPath(t *targetDetails, index int) cgroups.Path {
processPath := "/proc/" + strconv.Itoa(t.Pids[index]) + "/cgroup"
paths, err := parseCgroupFile(processPath, t, index)
if err != nil {
return getErrorPath(errors.Wrapf(err, "parse cgroup file %s", processPath))
}
return getExistingPath(paths, t.Pids[index], "")
}
// parseCgroupFile will read and verify the cgroup file entry of a container
func parseCgroupFile(path string, t *targetDetails, index int) (map[string]string, error) {
file, err := os.Open(path)
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to parse cgroup: %s", err.Error())}
}
defer file.Close()
return parseCgroupFromReader(file, t, index)
}
// parseCgroupFromReader will parse the cgroup file from the reader
func parseCgroupFromReader(r io.Reader, t *targetDetails, index int) (map[string]string, error) {
var (
cgroups = make(map[string]string)
s = bufio.NewScanner(r)
)
for s.Scan() {
var (
text = s.Text()
parts = strings.SplitN(text, ":", 3)
)
if len(parts) < 3 {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("invalid cgroup entry: %q", text)}
}
for _, subs := range strings.Split(parts[1], ",") {
if subs != "" {
cgroups[subs] = parts[2]
}
}
}
if err := s.Err(); err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("buffer scanner failed: %s", err.Error())}
}
return cgroups, nil
}
// getExistingPath will be used to get the existing valid cgroup path
func getExistingPath(paths map[string]string, pid int, suffix string) cgroups.Path {
for n, p := range paths {
dest, err := getCgroupDestination(pid, n)
if err != nil {
return getErrorPath(err)
}
rel, err := filepath.Rel(dest, p)
if err != nil {
return getErrorPath(err)
}
if rel == "." {
rel = dest
}
paths[n] = filepath.Join("/", rel)
}
return func(name cgroups.Name) (string, error) {
root, ok := paths[string(name)]
if !ok {
if root, ok = paths[fmt.Sprintf("name=%s", name)]; !ok {
return "", cgroups.ErrControllerNotActive
}
}
if suffix != "" {
return filepath.Join(root, suffix), nil
}
return root, nil
}
}
// getErrorPath will give the invalid cgroup path
func getErrorPath(err error) cgroups.Path {
return func(_ cgroups.Name) (string, error) {
return "", err
}
}
// getCgroupDestination will validate the subsystem with the mountpath in container mountinfo file.
func getCgroupDestination(pid int, subsystem string) (string, error) {
mountinfoPath := fmt.Sprintf("/proc/%d/mountinfo", pid)
file, err := os.Open(mountinfoPath)
if err != nil {
return "", err
}
defer file.Close()
s := bufio.NewScanner(file)
for s.Scan() {
fields := strings.Fields(s.Text())
for _, opt := range strings.Split(fields[len(fields)-1], ",") {
if opt == subsystem {
return fields[3], nil
}
}
}
if err := s.Err(); err != nil {
return "", err
}
return "", errors.Errorf("no destination found for %v ", subsystem)
}
// findValidCgroup will be used to get a valid cgroup path
func findValidCgroup(path cgroups.Path, t *targetDetails, index int) (string, error) {
for _, subsystem := range cgroupSubsystemList {
path, err := path(cgroups.Name(subsystem))
if err != nil {
log.Errorf("fail to retrieve the cgroup path, subsystem: %v, target: %v, err: %v", subsystem, t.ContainerIds[index], err)
continue
}
if strings.Contains(path, t.ContainerIds[index]) {
return path, nil
}
}
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: "could not find valid cgroup"}
}
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
experimentDetails.CPUcores = types.Getenv("CPU_CORES", "")
experimentDetails.CPULoad = types.Getenv("CPU_LOAD", "")
experimentDetails.FilesystemUtilizationPercentage = types.Getenv("FILESYSTEM_UTILIZATION_PERCENTAGE", "")
experimentDetails.FilesystemUtilizationBytes = types.Getenv("FILESYSTEM_UTILIZATION_BYTES", "")
experimentDetails.NumberOfWorkers = types.Getenv("NUMBER_OF_WORKERS", "")
experimentDetails.MemoryConsumption = types.Getenv("MEMORY_CONSUMPTION", "")
experimentDetails.VolumeMountPath = types.Getenv("VOLUME_MOUNT_PATH", "")
experimentDetails.StressType = types.Getenv("STRESS_TYPE", "")
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targets []*targetDetails, resultName, chaosNS string) {
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("[Abort]: Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
for _, t := range targets {
if err = terminateProcess(t); err != nil {
log.Errorf("[Abort]: unable to revert for %v pod, err :%v", t.Name, err)
continue
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("[Abort]: Unable to annotate the chaosresult for %v pod, err :%v", t.Name, err)
}
}
retry--
time.Sleep(1 * time.Second)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}
// getCGroupManager will return the cgroup for the given pid of the process
func getCGroupManager(t *targetDetails, index int) (interface{}, error, string) {
if cgroups.Mode() == cgroups.Unified {
groupPath := ""
output, err := exec.Command("bash", "-c", fmt.Sprintf("nsenter -t 1 -C -m -- cat /proc/%v/cgroup", t.Pids[index])).CombinedOutput()
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to get the cgroup: %s :%v", err.Error(), output)}, ""
}
log.Infof("cgroup output: %s", string(output))
parts := strings.Split(string(output), ":")
if len(parts) < 3 {
return "", fmt.Errorf("invalid cgroup entry: %s", string(output)), ""
}
if strings.HasSuffix(parts[len(parts)-3], "0") && parts[len(parts)-2] == "" {
groupPath = parts[len(parts)-1]
}
log.Infof("group path: %s", groupPath)
cgroup2, err := cgroupsv2.LoadManager("/sys/fs/cgroup", string(groupPath))
if err != nil {
return nil, errors.Errorf("Error loading cgroup v2 manager, %v", err), ""
}
return cgroup2, nil, groupPath
}
path := pidPath(t, index)
cgroup, err := findValidCgroup(path, t, index)
if err != nil {
return nil, stacktrace.Propagate(err, "could not find valid cgroup"), ""
}
cgroup1, err := cgroups.Load(cgroups.V1, cgroups.StaticPath(cgroup))
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to load the cgroup: %s", err.Error())}, ""
}
return cgroup1, nil, ""
}
// addProcessToCgroup will add the process to cgroup
// By default it will add to v1 cgroup
func addProcessToCgroup(pid int, control interface{}, groupPath string) error {
if cgroups.Mode() == cgroups.Unified {
args := []string{"-t", "1", "-C", "--", "sudo", "sh", "-c", fmt.Sprintf("echo %d >> /sys/fs/cgroup%s/cgroup.procs", pid, strings.ReplaceAll(groupPath, "\n", ""))}
output, err := exec.Command("nsenter", args...).CombinedOutput()
if err != nil {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: fmt.Sprintf("failed to add process to cgroup %s: %v", string(output), err),
}
}
return nil
}
var cgroup1 = control.(cgroups.Cgroup)
return cgroup1.Add(cgroups.Process{Pid: pid})
}
func injectChaos(t *targetDetails, stressors string, index int, stressType string) (*Command, error) {
stressCommand := fmt.Sprintf("pause nsutil -t %v -p -- %v", strconv.Itoa(t.Pids[index]), stressors)
// for io stress,we need to enter into mount ns of the target container
// enabling it by passing -m flag
if stressType == "pod-io-stress" {
stressCommand = fmt.Sprintf("pause nsutil -t %v -p -m -- %v", strconv.Itoa(t.Pids[index]), stressors)
}
log.Infof("[Info]: starting process: %v", stressCommand)
// launch the stress-ng process on the target container in paused mode
cmd := exec.Command("/bin/bash", "-c", stressCommand)
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
var buf bytes.Buffer
cmd.Stdout = &buf
cmd.Stderr = &buf
err = cmd.Start()
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("failed to start stress process: %s", err.Error())}
}
// add the stress process to the cgroup of target container
if err = addProcessToCgroup(cmd.Process.Pid, t.CGroupManagers[index], t.GroupPath); err != nil {
if killErr := cmd.Process.Kill(); killErr != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to add the stress process to cgroup %s and kill stress process: %s", err.Error(), killErr.Error())}
}
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to add the stress process to cgroup: %s", err.Error())}
}
log.Info("[Info]: Sending signal to resume the stress process")
// wait for the process to start before sending the resume signal
// TODO: need a dynamic way to check the start of the process
time.Sleep(700 * time.Millisecond)
// remove pause and resume or start the stress process
if err := cmd.Process.Signal(syscall.SIGCONT); err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to remove pause and start the stress process: %s", err.Error())}
}
return &Command{
Cmd: cmd,
Buffer: buf,
}, nil
}
type targetDetails struct {
Name string
Namespace string
TargetContainers []string
ContainerIds []string
Pids []int
CGroupManagers []interface{}
Cmds []*Command
Source string
GroupPath string
}
type Command struct {
Cmd *exec.Cmd
Buffer bytes.Buffer
}

View File

@ -1,318 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareAndInjectStressChaos contains the prepration & injection steps for the stress experiments.
func PrepareAndInjectStressChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodStressFault")
defer span.End()
var err error
//Set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
switch experimentsDetails.StressType {
case "pod-cpu-stress":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"CPU Core": experimentsDetails.CPUcores,
"CPU Load Percentage": experimentsDetails.CPULoad,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "pod-memory-stress":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Number of Workers": experimentsDetails.NumberOfWorkers,
"Memory Consumption": experimentsDetails.MemoryConsumption,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "pod-io-stress":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"FilesystemUtilizationBytes": experimentsDetails.FilesystemUtilizationBytes,
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
}
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode inject the stress chaos in all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodStressFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform the stress chaos
for _, pod := range targetPodList.Items {
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode inject the stress chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodStressFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreatePodStressFaultHelperPod")
defer span.End()
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
Volumes: []apiv1.Volume{
{
Name: "socket-path",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
{
Name: "sys-path",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/sys",
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
},
Args: []string{
"-c",
"./helpers -name stress-chaos",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "socket-path",
MountPath: experimentsDetails.SocketPath,
},
{
Name: "sys-path",
MountPath: "/sys",
},
},
SecurityContext: &apiv1.SecurityContext{
Privileged: &privilegedEnable,
RunAsUser: ptrint64(0),
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
SetEnv("CHAOS_UID", string(experimentsDetails.ChaosUID)).
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("CPU_CORES", experimentsDetails.CPUcores).
SetEnv("CPU_LOAD", experimentsDetails.CPULoad).
SetEnv("FILESYSTEM_UTILIZATION_PERCENTAGE", experimentsDetails.FilesystemUtilizationPercentage).
SetEnv("FILESYSTEM_UTILIZATION_BYTES", experimentsDetails.FilesystemUtilizationBytes).
SetEnv("NUMBER_OF_WORKERS", experimentsDetails.NumberOfWorkers).
SetEnv("MEMORY_CONSUMPTION", experimentsDetails.MemoryConsumption).
SetEnv("VOLUME_MOUNT_PATH", experimentsDetails.VolumeMountPath).
SetEnv("STRESS_TYPE", experimentsDetails.StressType).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
}
func ptrint64(p int64) *int64 {
return &p
}
// SetChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.CPUcores = common.ValidateRange(experimentsDetails.CPUcores)
experimentsDetails.CPULoad = common.ValidateRange(experimentsDetails.CPULoad)
experimentsDetails.MemoryConsumption = common.ValidateRange(experimentsDetails.MemoryConsumption)
experimentsDetails.NumberOfWorkers = common.ValidateRange(experimentsDetails.NumberOfWorkers)
experimentsDetails.FilesystemUtilizationPercentage = common.ValidateRange(experimentsDetails.FilesystemUtilizationPercentage)
experimentsDetails.FilesystemUtilizationBytes = common.ValidateRange(experimentsDetails.FilesystemUtilizationBytes)
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -1,264 +0,0 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/vmware"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/vmware/vm-poweroff/types"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var inject, abort chan os.Signal
// InjectVMPowerOffChaos injects the chaos in serial or parallel mode
func InjectVMPowerOffChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, cookie string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareVMPowerOffFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Fetching the target VM Ids
vmIdList := strings.Split(experimentsDetails.VMIds, ",")
// Calling AbortWatcher go routine, it will continuously watch for the abort signal and generate the required events and result
go abortWatcher(experimentsDetails, vmIdList, clients, resultDetails, chaosDetails, eventsDetails, cookie)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(ctx, experimentsDetails, vmIdList, cookie, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(ctx, experimentsDetails, vmIdList, cookie, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stops VMs in serial mode i.e. one after the other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, cookie string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "injectVMPowerOffFaultInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target VM Id list, %v", vmIdList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i, vmId := range vmIdList {
//Stopping the VM
log.Infof("[Chaos]: Stopping %s VM", vmId)
if err := vmware.StopVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return stacktrace.Propagate(err, fmt.Sprintf("failed to stop %s vm", vmId))
}
common.SetTargets(vmId, "injected", "VM", chaosDetails)
//Wait for the VM to completely stop
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_OFF state", vmId)
if err := vmware.WaitForVMStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return stacktrace.Propagate(err, "VM shutdown failed")
}
//Run the probes during the chaos
//The OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
//Starting the VM
log.Infof("[Chaos]: Starting back %s VM", vmId)
if err := vmware.StartVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return stacktrace.Propagate(err, "failed to start back vm")
}
//Wait for the VM to completely start
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_ON state", vmId)
if err := vmware.WaitForVMStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return stacktrace.Propagate(err, "vm failed to start")
}
common.SetTargets(vmId, "reverted", "VM", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode stops VMs in parallel mode i.e. all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, cookie string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "injectVMPowerOffFaultInParallelMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target VM Id list, %v", vmIdList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for _, vmId := range vmIdList {
//Stopping the VM
log.Infof("[Chaos]: Stopping %s VM", vmId)
if err := vmware.StopVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return stacktrace.Propagate(err, fmt.Sprintf("failed to stop %s vm", vmId))
}
common.SetTargets(vmId, "injected", "VM", chaosDetails)
}
for _, vmId := range vmIdList {
//Wait for the VM to completely stop
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_OFF state", vmId)
if err := vmware.WaitForVMStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return stacktrace.Propagate(err, "vm failed to shutdown")
}
}
//Running the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Waiting for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
for _, vmId := range vmIdList {
//Starting the VM
log.Infof("[Chaos]: Starting back %s VM", vmId)
if err := vmware.StartVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return stacktrace.Propagate(err, fmt.Sprintf("failed to start back %s vm", vmId))
}
}
for _, vmId := range vmIdList {
//Wait for the VM to completely start
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_ON state", vmId)
if err := vmware.WaitForVMStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return stacktrace.Propagate(err, "vm failed to successfully start")
}
}
for _, vmId := range vmIdList {
common.SetTargets(vmId, "reverted", "VM", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// abortWatcher watches for the abort signal and reverts the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, cookie string) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for _, vmId := range vmIdList {
vmStatus, err := vmware.GetVMStatus(experimentsDetails.VcenterServer, vmId, cookie)
if err != nil {
log.Errorf("failed to get vm status of %s when an abort signal is received: %s", vmId, err.Error())
}
if vmStatus != "POWERED_ON" {
log.Infof("[Abort]: Waiting for the VM %s to shutdown", vmId)
if err := vmware.WaitForVMStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
log.Errorf("vm %s failed to successfully shutdown when an abort signal was received: %s", vmId, err.Error())
}
log.Infof("[Abort]: Starting %s VM as abort signal has been received", vmId)
if err := vmware.StartVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
log.Errorf("vm %s failed to start when an abort signal was received: %s", vmId, err.Error())
}
}
common.SetTargets(vmId, "reverted", "VM", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -0,0 +1,269 @@
package lib
import (
"strconv"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-delete/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/openebs/maya/pkg/util/retry"
"github.com/pkg/errors"
appsv1 "k8s.io/api/apps/v1"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.ChaosServiceAccount == "" {
// Getting the serviceAccountName for the powerfulseal pod
err := GetServiceAccount(experimentsDetails, clients)
if err != nil {
return errors.Errorf("Unable to get the serviceAccountName, err: %v", err)
}
}
// generating a unique string which can be appended with the powerfulseal deployment name & labels for the uniquely identification
runID := common.GetRunID()
// generating the chaos inject event in the chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// Creating configmap for powerfulseal deployment
err := CreateConfigMap(experimentsDetails, clients, runID)
if err != nil {
return err
}
// Creating powerfulseal deployment
err = CreatePowerfulsealDeployment(experimentsDetails, clients, runID)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
//checking the status of the powerfulseal pod, wait till the powerfulseal pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "name=powerfulseal-"+runID, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("powerfulseal pod is not in running state, err: %v", err)
}
// Wait for Chaos Duration
log.Infof("[Wait]: Waiting for the %vs chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
//Deleting the powerfulseal deployment
log.Info("[Cleanup]: Deleting the powerfulseal deployment")
err = DeletePowerfulsealDeployment(experimentsDetails, clients, runID)
if err != nil {
return errors.Errorf("Unable to delete the powerfulseal deployment, err: %v", err)
}
//Deleting the powerfulseal configmap
log.Info("[Cleanup]: Deleting the powerfulseal configmap")
err = DeletePowerfulsealConfigmap(experimentsDetails, clients, runID)
if err != nil {
return errors.Errorf("Unable to delete the powerfulseal configmap, err: %v", err)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// GetServiceAccount find the serviceAccountName for the powerfulseal deployment
func GetServiceAccount(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Get(experimentsDetails.ChaosPodName, v1.GetOptions{})
if err != nil {
return err
}
experimentsDetails.ChaosServiceAccount = pod.Spec.ServiceAccountName
return nil
}
// CreateConfigMap creates a configmap for the powerfulseal deployment
func CreateConfigMap(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
data := make(map[string]string, 0)
// It will store all the details inside a string in well formated way
policy := GetConfigMapData(experimentsDetails)
data["policy"] = policy
configMap := &apiv1.ConfigMap{
ObjectMeta: v1.ObjectMeta{
Name: "policy-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"name": "policy-" + runID,
},
},
Data: data,
}
_, err := clients.KubeClient.CoreV1().ConfigMaps(experimentsDetails.ChaosNamespace).Create(configMap)
return err
}
// GetConfigMapData generates the configmap data for the powerfulseal deployments in desired format format
func GetConfigMapData(experimentsDetails *experimentTypes.ExperimentDetails) string {
policy := "config:" + "\n" +
" minSecondsBetweenRuns: 1" + "\n" +
" maxSecondsBetweenRuns: " + strconv.Itoa(experimentsDetails.ChaosInterval) + "\n" +
"podScenarios:" + "\n" +
" - name: \"delete random pods in application namespace\"" + "\n" +
" match:" + "\n" +
" - labels:" + "\n" +
" namespace: " + experimentsDetails.AppNS + "\n" +
" selector: " + experimentsDetails.AppLabel + "\n" +
" filters:" + "\n" +
" - randomSample:" + "\n" +
" size: 1" + "\n" +
" actions:" + "\n" +
" - kill:" + "\n" +
" probability: 0.77" + "\n" +
" force: " + strconv.FormatBool(experimentsDetails.Force)
return policy
}
// CreatePowerfulsealDeployment derive the attributes for powerfulseal deployment and create it
func CreatePowerfulsealDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
deployment := &appsv1.Deployment{
ObjectMeta: v1.ObjectMeta{
Name: "powerfulseal-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": "powerfulseal",
"name": "powerfulseal-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
},
Spec: appsv1.DeploymentSpec{
Selector: &v1.LabelSelector{
MatchLabels: map[string]string{
"name": "powerfulseal-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
},
Replicas: func(i int32) *int32 { return &i }(1),
Template: apiv1.PodTemplateSpec{
ObjectMeta: v1.ObjectMeta{
Labels: map[string]string{
"name": "powerfulseal-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
},
Spec: apiv1.PodSpec{
Volumes: []apiv1.Volume{
{
Name: "policyfile",
VolumeSource: apiv1.VolumeSource{
ConfigMap: &apiv1.ConfigMapVolumeSource{
LocalObjectReference: apiv1.LocalObjectReference{
Name: "policy-" + runID,
},
},
},
},
},
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
TerminationGracePeriodSeconds: func(i int64) *int64 { return &i }(0),
Containers: []apiv1.Container{
{
Name: "powerfulseal",
Image: "ksatchit/miko-powerfulseal:non-ssh",
Args: []string{
"autonomous",
"--inventory-kubernetes",
"--no-cloud",
"--policy-file=/root/policy_kill_random_default.yml",
"--use-pod-delete-instead-of-ssh-kill",
},
VolumeMounts: []apiv1.VolumeMount{
{
Name: "policyfile",
MountPath: "/root/policy_kill_random_default.yml",
SubPath: "policy",
},
},
},
},
},
},
},
}
_, err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaosNamespace).Create(deployment)
return err
}
//DeletePowerfulsealDeployment delete the powerfulseal deployment
func DeletePowerfulsealDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaosNamespace).Delete("powerfulseal-"+runID, &v1.DeleteOptions{})
if err != nil {
return err
}
err = retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
podSpec, err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaosNamespace).List(v1.ListOptions{LabelSelector: "name=powerfulseal-" + runID})
if err != nil || len(podSpec.Items) != 0 {
return errors.Errorf("Deployment is not deleted yet, err: %v", err)
}
return nil
})
return err
}
//DeletePowerfulsealConfigmap delete the powerfulseal configmap
func DeletePowerfulsealConfigmap(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
err := clients.KubeClient.CoreV1().ConfigMaps(experimentsDetails.ChaosNamespace).Delete("policy-"+runID, &v1.DeleteOptions{})
if err != nil {
return err
}
err = retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
podSpec, err := clients.KubeClient.CoreV1().ConfigMaps(experimentsDetails.ChaosNamespace).List(v1.ListOptions{LabelSelector: "name=policy-" + runID})
if err != nil || len(podSpec.Items) != 0 {
return errors.Errorf("configmap is not deleted yet, err: %v", err)
}
return nil
})
return err
}

View File

@ -0,0 +1,231 @@
package lib
import (
"strconv"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareContainerKill contains the prepration steps before chaos injection
func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.AppNS, experimentsDetails.TargetPod, experimentsDetails.AppLabel, experimentsDetails.PodsAffectedPerc, clients)
if err != nil {
return errors.Errorf("Unable to get the target pod list, err: %v", err)
}
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = GetTargetContainer(experimentsDetails, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("Unable to get the target container name, err: %v", err)
}
}
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//GetRestartCount return the restart count of target container
restartCountBefore := GetRestartCount(targetPodList, experimentsDetails.TargetContainer)
log.Infof("restartCount of target containers before chaos injection: %v", restartCountBefore)
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
err = CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
}
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "app="+experimentsDetails.ExperimentName+"-helper", experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
log.Infof("[Wait]: Waiting for the %vs chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
// It will verify that the restart count of container should increase after chaos injection
err = VerifyRestartCount(experimentsDetails, targetPodList, clients, restartCountBefore)
if err != nil {
return errors.Errorf("Target container is not restarted , err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeleteAllPod("app="+experimentsDetails.ExperimentName+"-helper", experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
//GetTargetContainer will fetch the container name from application pod
//This container will be used as target container
func GetTargetContainer(experimentsDetails *experimentTypes.ExperimentDetails, appName string, clients clients.ClientSets) (string, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(appName, v1.GetOptions{})
if err != nil {
return "", err
}
return pod.Spec.Containers[0].Name, nil
}
//GetRestartCount return the restart count of target container
func GetRestartCount(targetPodList apiv1.PodList, containerName string) []int {
restartCount := []int{}
for _, pod := range targetPodList.Items {
for _, container := range pod.Status.ContainerStatuses {
if container.Name == containerName {
restartCount = append(restartCount, int(container.RestartCount))
break
}
}
}
return restartCount
}
//VerifyRestartCount verify the restart count of target container that it is restarted or not after chaos injection
// the restart count of container should increase after chaos injection
func VerifyRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, podList apiv1.PodList, clients clients.ClientSets, restartCountBefore []int) error {
restartCountAfter := []int{}
err := retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
for index := range podList.Items {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(podList.Items[index].Name, v1.GetOptions{})
if err != nil {
return err
}
for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentsDetails.TargetContainer {
restartCountAfter = append(restartCountAfter, int(container.RestartCount))
break
}
}
}
return nil
})
if err != nil {
return err
}
for index := range restartCountBefore {
// it will fail if restart count won't increase
if restartCountAfter[index] <= restartCountBefore[index] {
return errors.Errorf("Target container is not restarted")
}
}
log.Infof("restartCount of target container after chaos injection: %v", restartCountAfter)
return nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appName, appNodeName, runID string) error {
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper",
"name": experimentsDetails.ExperimentName + "-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/var/run/docker.sock",
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullAlways,
Command: []string{
"pumba",
},
Args: []string{
"--random",
"--interval",
strconv.Itoa(experimentsDetails.ChaosInterval) + "s",
"kill",
"--signal",
"SIGKILL",
"re2:k8s_" + experimentsDetails.TargetContainer + "_" + appName,
},
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: "/var/run/docker.sock",
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}

View File

@ -0,0 +1,169 @@
package lib
import (
"strconv"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-cpu-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var err error
// PreparePodCPUHog contains prepration steps before chaos injection
func PreparePodCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
targetPodList, err := common.GetPodList(experimentsDetails.AppNS, experimentsDetails.TargetPod, experimentsDetails.AppLabel, experimentsDetails.PodsAffectedPerc, clients)
if err != nil {
return errors.Errorf("Unable to get the target pod list, err: %v", err)
}
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// creating the helper pod to perform cpu chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"CPUcores": experimentsDetails.CPUcores,
})
err = CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
}
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "app="+experimentsDetails.ExperimentName+"-helper", experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", strconv.Itoa(experimentsDetails.ChaosDuration+30))
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, "app="+experimentsDetails.ExperimentName+"-helper", clients, experimentsDetails.ChaosDuration+30, "pumba-stress")
if err != nil || podStatus == "Failed" {
return errors.Errorf("helper pod failed due to, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeleteAllPod("app="+experimentsDetails.ExperimentName+"-helper", experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appName, appNodeName, runID string) error {
helperPod := &apiv1.Pod{
TypeMeta: v1.TypeMeta{
Kind: "Pod",
APIVersion: "v1",
},
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper",
"name": experimentsDetails.ExperimentName + "-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
// prevent pumba from killing itself
"com.gaiaadm.pumba": "true",
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/var/run/docker.sock",
},
},
},
},
Containers: []apiv1.Container{
{
Name: "pumba-stress",
Image: experimentsDetails.LIBImage,
Args: GetContainerArguments(experimentsDetails, appName),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: "/var/run/docker.sock",
},
},
ImagePullPolicy: apiv1.PullPolicy("Always"),
SecurityContext: &apiv1.SecurityContext{
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}
// GetContainerArguments derives the args for the pumba stress helper pod
func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails, appName string) []string {
stressArgs := []string{
"--log-level",
"debug",
"--label",
"io.kubernetes.pod.name=" + appName,
"stress",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--stressors",
"--cpu " + strconv.Itoa(experimentsDetails.CPUcores) + " --timeout " + strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
return stressArgs
}

View File

@ -0,0 +1,169 @@
package lib
import (
"strconv"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-memory-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var err error
// PreparePodMemoryHog contains prepration steps before chaos injection
func PreparePodMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
targetPodList, err := common.GetPodList(experimentsDetails.AppNS, experimentsDetails.TargetPod, experimentsDetails.AppLabel, experimentsDetails.PodsAffectedPerc, clients)
if err != nil {
return errors.Errorf("Unable to get the target pod list, err: %v", err)
}
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// creating the helper pod to perform memory chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"MemoryBytes": experimentsDetails.MemoryConsumption,
})
err = CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
}
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "app="+experimentsDetails.ExperimentName+"-helper", experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", strconv.Itoa(experimentsDetails.ChaosDuration+30))
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, "app="+experimentsDetails.ExperimentName+"-helper", clients, experimentsDetails.ChaosDuration+30, "pumba-stress")
if err != nil || podStatus == "Failed" {
return errors.Errorf("helper pod failed due to, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeleteAllPod("app="+experimentsDetails.ExperimentName+"-helper", experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appName, appNodeName, runID string) error {
helperPod := &apiv1.Pod{
TypeMeta: v1.TypeMeta{
Kind: "Pod",
APIVersion: "v1",
},
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper",
"name": experimentsDetails.ExperimentName + "-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
// prevent pumba from killing itself
"com.gaiaadm.pumba": "true",
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/var/run/docker.sock",
},
},
},
},
Containers: []apiv1.Container{
{
Name: "pumba-stress",
Image: experimentsDetails.LIBImage,
Args: GetContainerArguments(experimentsDetails, appName),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: "/var/run/docker.sock",
},
},
ImagePullPolicy: apiv1.PullPolicy("Always"),
SecurityContext: &apiv1.SecurityContext{
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}
// GetContainerArguments derives the args for the pumba stress helper pod
func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails, appName string) []string {
stressArgs := []string{
"--log-level",
"debug",
"--label",
"io.kubernetes.pod.name=" + appName,
"stress",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--stressors",
"--cpu 1 --vm 1 --vm-bytes " + strconv.Itoa(experimentsDetails.MemoryConsumption) + "M --timeout " + strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
return stressArgs
}

View File

@ -0,0 +1,44 @@
package corruption
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
var err error
//PodNetworkCorruptionChaos contains the steps to prepare and inject chaos
func PodNetworkCorruptionChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args := GetContainerArguments(experimentsDetails)
err = network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
if err != nil {
return err
}
return nil
}
// GetContainerArguments derives the args for the pumba pod
func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) []string {
baseArgs := []string{
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args = network_chaos.AddTargetIpsArgs(experimentsDetails.TargetIPs, args)
args = network_chaos.AddTargetIpsArgs(network_chaos.GetIpsForTargetHosts(experimentsDetails.TargetHosts), args)
args = append(args, "corrupt", "--percent", strconv.Itoa(experimentsDetails.NetworkPacketCorruptionPercentage))
return args
}

View File

@ -0,0 +1,44 @@
package duplication
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
var err error
//PodNetworkDuplicationChaos contains the steps to prepare and inject chaos
func PodNetworkDuplicationChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args := GetContainerArguments(experimentsDetails)
err = network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
if err != nil {
return err
}
return nil
}
// GetContainerArguments derives the args for the pumba pod
func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) []string {
baseArgs := []string{
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args = network_chaos.AddTargetIpsArgs(experimentsDetails.TargetIPs, args)
args = network_chaos.AddTargetIpsArgs(network_chaos.GetIpsForTargetHosts(experimentsDetails.TargetHosts), args)
args = append(args, "duplicate", "--percent", strconv.Itoa(experimentsDetails.NetworkPacketDuplicationPercentage))
return args
}

View File

@ -0,0 +1,44 @@
package latency
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
var err error
//PodNetworkLatencyChaos contains the steps to prepare and inject chaos
func PodNetworkLatencyChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args := GetContainerArguments(experimentsDetails)
err = network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
if err != nil {
return err
}
return nil
}
// GetContainerArguments derives the args for the pumba pod
func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) []string {
baseArgs := []string{
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args = network_chaos.AddTargetIpsArgs(experimentsDetails.TargetIPs, args)
args = network_chaos.AddTargetIpsArgs(network_chaos.GetIpsForTargetHosts(experimentsDetails.TargetHosts), args)
args = append(args, "delay", "--time", strconv.Itoa(experimentsDetails.NetworkLatency))
return args
}

View File

@ -0,0 +1,44 @@
package loss
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
var err error
//PodNetworkLossChaos contains the steps to prepare and inject chaos
func PodNetworkLossChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args := GetContainerArguments(experimentsDetails)
err = network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
if err != nil {
return err
}
return nil
}
// GetContainerArguments derives the args for the pumba pod
func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) []string {
baseArgs := []string{
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args = network_chaos.AddTargetIpsArgs(experimentsDetails.TargetIPs, args)
args = network_chaos.AddTargetIpsArgs(network_chaos.GetIpsForTargetHosts(experimentsDetails.TargetHosts), args)
args = append(args, "loss", "--percent", strconv.Itoa(experimentsDetails.NetworkPacketLossPercentage))
return args
}

View File

@ -0,0 +1,181 @@
package lib
import (
"net"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var err error
//PrepareAndInjectChaos contains the prepration and chaos injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args []string) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.AppNS, experimentsDetails.TargetPod, experimentsDetails.AppLabel, experimentsDetails.PodsAffectedPerc, clients)
if err != nil {
return errors.Errorf("Unable to get the target pod list, err: %v", err)
}
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
})
// args contains details of the specific chaos injection
// constructing `argsWithRegex` based on updated regex with a diff pod name
// without extending/concatenating the args var itself
argsWithRegex := append(args, "re2:k8s_POD_"+pod.Name+"_"+experimentsDetails.AppNS)
log.Infof("Arguments for running %v are %v", experimentsDetails.ExperimentName, argsWithRegex)
err = CreateHelperPod(experimentsDetails, clients, pod.Spec.NodeName, runID, argsWithRegex)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
}
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "app="+experimentsDetails.ExperimentName+"-helper", experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", experimentsDetails.ChaosDuration)
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, "app="+experimentsDetails.ExperimentName+"-helper", clients, experimentsDetails.ChaosDuration+30, chaosDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
return errors.Errorf("helper pod failed due to, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeleteAllPod("app="+experimentsDetails.ExperimentName+"-helper", experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appNodeName, runID string, args []string) error {
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper",
"name": experimentsDetails.ExperimentName + "-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/var/run/docker.sock",
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullAlways,
Command: []string{
"pumba",
},
Args: args,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: "/var/run/docker.sock",
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}
// AddTargetIpsArgs inserts a comma-separated list of targetIPs (if provided by the user) into the pumba command/args
func AddTargetIpsArgs(targetIPs string, args []string) []string {
if targetIPs == "" {
return args
}
ips := strings.Split(targetIPs, ",")
for i := range ips {
args = append(args, "--target", strings.TrimSpace(ips[i]))
}
return args
}
// GetIpsForTargetHosts resolves IP addresses for comma-separated list of target hosts and returns comma-separated ips
func GetIpsForTargetHosts(targetHosts string) string {
if targetHosts == "" {
return ""
}
hosts := strings.Split(targetHosts, ",")
var commaSeparatedIPs []string
for i := range hosts {
ips, err := net.LookupIP(hosts[i])
if err != nil {
log.Infof("Unknown host")
} else {
for j := range ips {
log.Infof("IP address: %v", ips[j])
commaSeparatedIPs = append(commaSeparatedIPs, ips[j].String())
}
}
}
return strings.Join(commaSeparatedIPs, ",")
}

View File

@ -0,0 +1,187 @@
package lib
import (
"strconv"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-io-stress/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var err error
// PreparePodIOStress contains prepration steps before chaos injection
func PreparePodIOStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
targetPodList, err := common.GetPodList(experimentsDetails.AppNS, experimentsDetails.TargetPod, experimentsDetails.AppLabel, experimentsDetails.PodsAffectedPerc, clients)
if err != nil {
return errors.Errorf("Unable to get the target pod list, err: %v", err)
}
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"FilesystemUtilizationBytes": experimentsDetails.FilesystemUtilizationBytes,
})
err = CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
}
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "app="+experimentsDetails.ExperimentName+"-helper", experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", strconv.Itoa(experimentsDetails.ChaosDuration+30))
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, "app="+experimentsDetails.ExperimentName+"-helper", clients, experimentsDetails.ChaosDuration+30, "pumba-stress")
if err != nil || podStatus == "Failed" {
return errors.Errorf("helper pod failed due to, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeleteAllPod("app="+experimentsDetails.ExperimentName+"-helper", experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appName, appNodeName, runID string) error {
helperPod := &apiv1.Pod{
TypeMeta: v1.TypeMeta{
Kind: "Pod",
APIVersion: "v1",
},
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper",
"name": experimentsDetails.ExperimentName + "-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
// prevent pumba from killing itself
"com.gaiaadm.pumba": "true",
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/var/run/docker.sock",
},
},
},
},
Containers: []apiv1.Container{
{
Name: "pumba-stress",
Image: experimentsDetails.LIBImage,
Args: GetContainerArguments(experimentsDetails, appName),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: "/var/run/docker.sock",
},
},
ImagePullPolicy: apiv1.PullPolicy("Always"),
SecurityContext: &apiv1.SecurityContext{
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}
// GetContainerArguments derives the args for the pumba stress helper pod
func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails, appName string) []string {
var hddbytes string
if experimentsDetails.FilesystemUtilizationBytes == 0 {
if experimentsDetails.FilesystemUtilizationPercentage == 0 {
hddbytes = "10%"
log.Info("Neither of FilesystemUtilizationPercentage or FilesystemUtilizationBytes provided, proceeding with a default FilesystemUtilizationPercentage value of 10%")
} else {
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationPercentage) + "%"
}
} else {
if experimentsDetails.FilesystemUtilizationPercentage == 0 {
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationBytes) + "G"
} else {
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationPercentage) + "%"
log.Warn("Both FsUtilPercentage & FsUtilBytes provided as inputs, using the FsUtilPercentage value to proceed with stress exp")
}
}
stressArgs := []string{
"--log-level",
"debug",
"--label",
"io.kubernetes.pod.name=" + appName,
"stress",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--stressors",
"--cpu 1 --io " + strconv.Itoa(experimentsDetails.NumberOfWorkers) + " --hdd " + strconv.Itoa(experimentsDetails.NumberOfWorkers) + " --hdd-bytes " + hddbytes + " --timeout " + strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
return stressArgs
}

Some files were not shown because too many files have changed in this diff Show More