Compare commits

...

177 Commits

Author SHA1 Message Date
Neelanjan Manna e7b4e7dbe4
chore: adds retries with timeout for litmus and k8s client operations (#766)
* chore: adds retries for k8s api operations

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

* chore: adds retries for litmus api operations

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

---------

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2025-08-14 15:41:34 +05:30
Neelanjan Manna 62a4986c78
chore: adds common functions for helper pod lifecycle management (#764)
Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2025-08-14 12:18:29 +05:30
Neelanjan Manna d626cf3ec4
Merge pull request #761 from litmuschaos/CHAOS-9404
feat: adds port filtering for ip/hostnames for network faults, adds pod-network-rate-limit fault
2025-08-13 16:40:51 +05:30
neelanjan00 59125424c3
feat: adds ip+port filtering, adds pod-network-rate-limit fault
Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2025-08-13 16:13:24 +05:30
Neelanjan Manna 2e7ff836fc
feat: Adds multi container support for pod stress faults (#757)
* chore: Fix typo in log statement

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

* chore: adds multi-container stress chaos system with improved lifecycle management and better error handling

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

---------

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-08-13 16:04:20 +05:30
Prexy e61d5b33be
written test for `workload.go` in `pkg/workloads` (#767)
* written test for workload.go in pkg/workloads

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* checking go formatting

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
2025-08-12 17:30:22 +05:30
Prexy 14fe30c956
test: add unit tests for exec.go file in pkg/utils folder (#755)
* test: add unit tests for exec.go file in pkg/utils folder

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* fixing gofmt

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* creating table driven test and also updates TestCheckPodStatus

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-07-24 15:33:25 +05:30
Prexy 4ae08899e0
test: add unit tests for retry.go in pkg/utils folder (#754)
* test: add unit tests for retry.go in pkg/utils folder

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* fixing gofmt

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
2025-07-24 11:55:42 +05:30
Prexy 2c38220cca
test: add unit tests for RandStringBytesMask and GetRunID in stringutils (#753)
* test: add unit tests for RandStringBytesMask and GetRunID in stringutils

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* fixing gofmt

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
2025-07-24 11:55:26 +05:30
Sami S. 07de11eeee
Fix: handle pagination in ssm describeInstanceInformation & API Rate Limit (#738)
* Fix: handle pagination in ssm describe

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* implement exponential backoff with jitter for API rate limiting

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Refactor

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Update pkg/cloud/aws/ssm/ssm-operations.go

Co-authored-by: Neelanjan Manna <neelanjanmanna@gmail.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fixup

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Update pkg/cloud/aws/ssm/ssm-operations.go

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Fix: include error message from stderr if container-kill fails (#740) (#741)

Signed-off-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fix(logs): Fix the error logs for container-kill fault (#745)

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fix(container-kill): Fixed the container stop command timeout issue (#747)

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* feat: Add a rds-instance-stop chaos fault (#710)

* feat: Add a rds-instance-stop chaos fault

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>

---------

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Update pkg/cloud/aws/ssm/ssm-operations.go

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fix go fmt ./...

Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Filter instances on api call

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fixes lint

Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>

---------

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>
Signed-off-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>
Co-authored-by: Neelanjan Manna <neelanjanmanna@gmail.com>
Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
Co-authored-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Co-authored-by: Jongwoo Han <jongwooo.han@gmail.com>
Co-authored-by: Udit Gaurav <udit.gaurav@harness.io>
2025-04-30 10:25:10 +05:30
Jongwoo Han 5c22472290
feat: Add a rds-instance-stop chaos fault (#710)
* feat: Add a rds-instance-stop chaos fault

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>

---------

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
2025-04-24 12:54:05 +05:30
Shubham Chaudhary e7b3fb6f9f
fix(container-kill): Fixed the container stop command timeout issue (#747)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-04-15 18:20:23 +05:30
Shubham Chaudhary e1eaea9110
fix(logs): Fix the error logs for container-kill fault (#745)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-04-03 15:35:00 +05:30
Björn Kylberg 491dc5e31a
Fix: include error message from stderr if container-kill fails (#740) (#741)
Signed-off-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
2025-04-03 14:44:05 +05:30
Shubham Chaudhary caae228e35
(chore): fix the go fmt of the files (#734)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-01-17 12:08:34 +05:30
kbfu 34a62d87f3
fix the cgroup 2 problem (#677)
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-01-17 11:29:30 +05:30
Suhyen Im 8246ff891b
feat: propagate trace context to helper pods (#722)
Signed-off-by: Suhyen Im <suhyenim.kor@gmail.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Co-authored-by: Saranya Jena <saranya.jena@harness.io>
2025-01-15 16:34:19 +05:30
Namkyu Park 9b29558585
feat: export k6 results output to the OTEL collector (#726)
* Export k6 results to the otel collector

Signed-off-by: namkyu1999 <lak9348@gmail.com>

* add envs for multiple projects

Signed-off-by: namkyu1999 <lak9348@gmail.com>

---------

Signed-off-by: namkyu1999 <lak9348@gmail.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Co-authored-by: Saranya Jena <saranya.jena@harness.io>
2025-01-15 16:33:43 +05:30
Sayan Mondal c7ab5a3d7c
Merge pull request #732 from heysujal/add-openssh-clients
add openssh-clients to dockerfile
2025-01-15 11:28:17 +05:30
Shubham Chaudhary 3bef3ad67e
Merge branch 'master' into add-openssh-clients 2025-01-15 10:57:02 +05:30
Sujal Gupta b2f68a6ad1
use revertErr instead of err (#730)
Signed-off-by: Sujal Gupta <sujalgupta6100@gmail.com>
2025-01-15 10:38:32 +05:30
Sujal Gupta cd2ec26083 add openssh-clients to dockerfile
Signed-off-by: Sujal Gupta <sujalgupta6100@gmail.com>
2025-01-06 01:04:25 +05:30
Shubham Chaudhary 7e08c69750
chore(stress): Fix the stress faults (#723)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-11-20 15:18:59 +05:30
Namkyu Park 3ef23b01f9
feat: implement opentelemetry for distributed tracing (#706)
* feat: add otel & tracing for distributed tracing

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* feat: add tracing codes to chaslib

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: misc

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: make otel optional

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: skip if litmus-go not received trace_parent

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: Set context.Context as a parameter in each function

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* update templates

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* feat: rename spans and enhance coverage

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: avoid shadowing

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: add logs

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: add logs

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: fix templates

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

---------

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>
2024-10-24 16:14:57 +05:30
Shubham Chaudhary 0cd6c6fae3
(chore): Fix the build, push, and release pipelines (#716)
* (chore): Fix the build, push, and release pipelines

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* (chore): Fix the dockerfile

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-10-15 23:33:54 +05:30
Shubham Chaudhary 6a386d1410
(chore): Fix the disk-fill fault (#715)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-10-15 22:15:14 +05:30
Vedant Shrotria fc646d678c
Merge pull request #707 from dusdjhyeon/ubi-migration
UBI migration of Images - go-runner
2024-08-23 11:32:44 +05:30
dusdjhyeon 6257c1abb8
feat: add build arg
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-22 16:13:18 +09:00
dusdjhyeon 755a562efe
Merge branch 'ubi-migration' of https://github.com/dusdjhyeon/litmus-go into ubi-migration 2024-08-22 16:10:37 +09:00
dusdjhyeon d0814df9ea
fix: set build args
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-22 16:09:40 +09:00
Vedant Shrotria a6012039fd
Update .github/workflows/run-e2e-on-pr-commits.yml 2024-08-22 11:19:42 +05:30
Vedant Shrotria a1f602ba98
Update .github/workflows/run-e2e-on-pr-commits.yml 2024-08-22 11:19:33 +05:30
Vedant Shrotria 7476994a36
Update .github/workflows/run-e2e-on-pr-commits.yml 2024-08-22 11:19:25 +05:30
Vedant Shrotria 3440fb84eb
Update .github/workflows/release.yml 2024-08-22 11:18:46 +05:30
Vedant Shrotria 652e6b8465
Update .github/workflows/release.yml 2024-08-22 11:18:39 +05:30
Vedant Shrotria 996f3b3f5f
Update .github/workflows/push.yml 2024-08-22 11:18:10 +05:30
Vedant Shrotria e73f3bfb21
Update .github/workflows/push.yml 2024-08-22 11:17:54 +05:30
Vedant Shrotria 054d091dce
Update .github/workflows/build.yml 2024-08-22 11:17:37 +05:30
Vedant Shrotria c362119e05
Update .github/workflows/build.yml 2024-08-22 11:17:15 +05:30
dusdjhyeon 31bf293140
fix: change go version and others
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-22 14:39:17 +09:00
Vedant Shrotria 9569c8b2f4
Merge branch 'master' into ubi-migration 2024-08-21 16:25:14 +05:30
dusdjhyeon 4f9f4e0540
fix: upgrade version for vulnerability
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:58 +09:00
dusdjhyeon 399ccd68a0
fix: change kubectl crictl latest version
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:58 +09:00
Jongwoo Han 35958eae38
Rename env to EC2_INSTANCE_TAG (#708)
Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon 003a3dc02c
fix: change docker repo
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon d4eed32a6d
fix: change version arg
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon af7322bece
fix: app_dir and yum
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon bd853f6e25
feat: migration base image
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon cfdb205ca3
fix: typos and add arg
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
Jongwoo Han f051d5ac7c
Rename env to EC2_INSTANCE_TAG (#708)
Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
2024-08-14 16:42:35 +05:30
Andrii Kotelnikov 10e9b774a8
Update workloads.go (#705)
Fix issue with empty kind field
Signed-off-by: Andrii Kotelnikov <andrusha@ukr.net>
2024-06-14 14:16:47 +05:30
Vedant Shrotria 9689f74fce
Merge pull request #701 from Jonsy13/add-gitleaks
Adding `gitleaks` as PR Check
2024-05-20 10:27:09 +05:30
Vedant Shrotria d273ba628b
Merge branch 'master' into add-gitleaks 2024-05-17 17:37:15 +05:30
Jonsy13 2315eaf2a4
Added gitleaks
Signed-off-by: Jonsy13 <vedant.shrotria@harness.io>
2024-05-17 17:34:36 +05:30
Shubham Chaudhary f2b2c2747a
chore(io-stress): Fix the pod-io-stress experiment (#700)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-05-17 16:43:19 +05:30
Udit Gaurav 66d01011bb
Fix pipeline issues (#694)
Fix pipeline issues

---------

Signed-off-by: Udit Gaurav <udit.gaurav@harness.io
2024-04-26 14:17:01 +05:30
Udit Gaurav a440615a51
Fix gofmt issues (#695) 2024-04-25 23:45:59 +05:30
Shubham Chaudhary 78eec36b79
chore(probe): Fix the probe description on failure (#692)
* chore(probe): Fix the probe description on failure

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* chore(probe): Consider http timeout as probe failure

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-04-23 18:06:48 +05:30
Michael Morris b5a24b4044
enable ALL for TARGET_CONTAINER (#683)
Signed-off-by: MichaelMorris <michael.morris@est.tech>
2024-03-14 19:44:18 +05:30
Shubham Chaudhary 6d26c21506
test: Adding fuzz testing for common util (#691)
* test: Adding fuzz testing for common util

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* fix the random interval test

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-03-12 17:02:01 +05:30
Namkyu Park 5554a29ea2
chore: fix typos (#690)
Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>
2024-03-11 20:26:50 +05:30
Sayan Mondal 5f0d882912
test: Adding fuzz testing for common util (#688) 2024-03-08 21:42:20 +05:30
Namkyu Park eef3b4021d
feat: Add a k6-loadgen chaos fault (#687)
* feat: add k6-loadgen

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>
2024-03-07 19:19:51 +05:30
smit thakkar 96f6571e77
fix: accomodate for pending pods with no IP address in network fault (#684)
Signed-off-by: smit thakkar <smit.thakkar@deliveryhero.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-03-01 15:06:07 +05:30
Nageshbansal b9f897be21
Adds support for tolerations in source cmd probe (#681)
Signed-off-by: nagesh bansal <nageshbansal59@gmail.com>
2024-03-01 14:51:55 +05:30
Michael Morris c2f8f79ab9
Fix consider appKind when filtering target pods (#680)
* Fix consider appKind when filtering target pods

Signed-off-by: MichaelMorris <michael.morris@est.tech>

* Implemted review comment

Signed-off-by: MichaelMorris <michael.morris@est.tech>

---------

Signed-off-by: MichaelMorris <michael.morris@est.tech>
2024-03-01 14:41:29 +05:30
Nageshbansal 69927489d2
Fixes Probe logging for all iterations (#676)
* Fixes Probe logging for all iterations

Signed-off-by: nagesh bansal <nageshbansal59@gmail.com>
2024-01-11 17:48:26 +05:30
Shubham Chaudhary bdddd0d803
Add port blacklisting in the pod-network faults (#673)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-10-12 19:37:56 +05:30
Shubham Chaudhary 1b75f78632
fix(action): Fix the github release action (#672)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-09-29 16:02:01 +05:30
Calvinaud b710216113
Revert chaos when error during drain for node-drain experiments (#668)
- Added an call to uncordonNode in case of an error of the drainNode function

Signed-off-by: Calvin Audier <calvin.audier@gmail.com>
2023-09-21 23:54:33 +05:30
Shubham Chaudhary 392ea29800
chore(network): fix the destination ips for network experiment for service mesh (#666)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-09-15 11:00:34 +05:30
Shubham Chaudhary db13d05e28
Add fix to remove the job labels from helper pod (#665)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-07-24 13:09:57 +05:30
Vedant Shrotria d737281985
Merge pull request #661 from Jonsy13/group-optional-litmus-go
Upgrading chaos-operator version for making group optional in k8s probe
2023-06-05 13:05:51 +05:30
Jonsy13 61751a9404
Added changes for operator upgrade
Signed-off-by: Jonsy13 <vedant.shrotria@harness.io>
2023-06-05 12:34:12 +05:30
Shubham Chaudhary d4f9826ea9
chore(fields): Updating optional fields to pointer type (#658)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-25 14:02:22 +05:30
Shubham Chaudhary 3ab28a5110
run workflow on dispatch event and use token from secrets (#657)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 01:10:08 +05:30
Shubham Chaudhary 3005d02c24
use the official snyk action (#656)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 01:01:09 +05:30
Shubham Chaudhary 1971b8093b
fix the snyk token name (#655)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 00:35:26 +05:30
Shubham Chaudhary e5a831f713
fix the github workflow (#654)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 00:29:54 +05:30
Shubham Chaudhary 95c9602019
adding security scan workflow (#653)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 00:24:53 +05:30
Shubham Chaudhary f36b0761aa
adding security scan workflow (#652)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 00:21:19 +05:30
Shubham Chaudhary d3b760d76d
chore(unit): Adding units to the duration fields (#650)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-18 13:40:10 +05:30
Shubham Chaudhary 0bbe8e23e7
Revert "probe comparator logging for all iterations (#646)" (#649)
This reverts commit 8e0bbbbd5d.
2023-04-18 01:01:48 +05:30
Neelanjan Manna 5ade71c694
chore(probe): Update Probe failure descriptions and error codes (#648)
* adds probe description changes

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2023-04-17 17:24:23 +05:30
Shubham Chaudhary 8e0bbbbd5d
probe comparator logging for all iterations (#646)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-17 11:24:47 +05:30
Shubham Chaudhary d0b36e9a50
fix(probe): ProbeSuccessPercentage should not be 100% if experiment terminated with Error (#645)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-10 15:17:51 +05:30
Shubham Chaudhary eee4421c3c
chore(sdk): Updating the sdk to latest experiment schema (#644)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-03-20 17:01:46 +05:30
Neelanjan Manna a1c85ca52c
chore(experiments): Replaces default container runtime to containerd (#640)
* replaces default container runtime to containerd

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2023-03-14 19:41:02 +05:30
Shubham Chaudhary f8b370e6f4
add the experiment phase as completed with error (#642)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-03-09 21:52:17 +05:30
Neelanjan Manna 04c031a281
updates http probe wait duration to ms (#643)
Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2023-03-08 12:46:21 +05:30
Shubham Chaudhary ea2b83e1a0
adding backend compatibility to probe retry (#639)
* adding backend compatibility to probe retry

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* updating the chaos-operator version

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-02-22 10:03:56 +05:30
Shubham Chaudhary 291ae4a6ad
chore(error-verdict): Adding experiment verdict as error (#637)
* chore(error-verdict): Adding experiment verdict as error

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* updating error verdict

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* updating the chaos-operator version

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* adding comments and changing function name

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-02-21 23:37:56 +05:30
Akash Shrivastava 8b68c4b5cb
Added filtering vm instance by tag (#635)
Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
2023-02-15 16:48:47 +05:30
Shubham Chaudhary 7bdb18016f
chore(probe): updating retries to attempt and use the timout for per attempt timeout (#636)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-02-09 17:02:31 +05:30
Shubham Chaudhary 4aa778ef9c
chore(probe-timeout): converting probe timeout in milli seconds (#634)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-02-05 01:34:39 +05:30
Shubham Chaudhary 1f02800c23
chore(parallel): add support to create unique runid for same timestamp (#633)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-01-20 11:11:12 +05:30
Shubham Chaudhary 2134933c03
fix(stderr): adding the fix for cmd.Exec considers log.info as stderr (#632)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-01-10 21:58:02 +05:30
Shubham Chaudhary d151c8f1e0
chore(sidecar): adding sidecar to the helper pod (#630)
* chore(sidecar): adding sidecar to the helper pod

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* adding support for multiple sidecars

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* chore(sidecar): adding env and envFrom fields

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-01-10 12:58:57 +05:30
Shubham Chaudhary 3622f505c9
chore(probe): Adding the root cause into probe description (#628)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-01-09 15:15:14 +05:30
Shubham Chaudhary dc9283614b
chore(sdk): adding failstep and lib changes to sdk (#627)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-12-16 00:36:10 +05:30
Shubham Chaudhary 5eed28bf3f
fix(vulrn):fixing the security vulnerabilities (#617)
* fix(vulrn): fixing the security vulnerabilities

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-12-15 17:22:13 +05:30
Shubham Chaudhary 77b30e221e
(chore): Adding user-friendly failsteps and removing non-litmus libs (#626)
* feat(failstep):  Adding failstep in all experiment and removed non-litmus libs

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-12-15 16:42:27 +05:30
Neelanjan Manna eb98d50855
fix(gcp-label-experiments): Fix label filtering logic (#593)
* fix(gcp-label-experiments): fix label filter logic

Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>
2022-11-24 19:27:46 +05:30
Akash Shrivastava 3e72bb14e9
changed dd to use nsenter (#605)
Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>

Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-24 11:02:36 +05:30
Shubham Chaudhary 115ec45339
fix(pod-delete): fixing pod-delete experiment and refactor workload utils (#610)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-22 17:29:33 +05:30
Shubham Chaudhary 0e18911da6
chore(spring-boot): add spring-boot all faults option and remove duplicate code (#609)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-21 23:39:32 +05:30
Shubham Chaudhary e1eb389edf
Adding single helper and selectors changes to master (#608)
* feat(helper): adding single helper per node


Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-21 22:58:46 +05:30
Akash Shrivastava 39bbdbbf44
assigned msg var (#606)
Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>

Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
2022-11-18 14:14:57 +05:30
Shubham Chaudhary ff285178d5
chore(spring-boot): simplifying spring boot experiments env (#604)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-18 11:34:41 +05:30
Soumya Ghosh Dastidar f16249f802
feat: add resource name filtering in k8s probe (#598)
* feat: add resource name filtering in k8s probe

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>
2022-11-14 12:49:55 +05:30
Shubham Chaudhary 21969543bf
chore(spring-boot): spliting spring-boot-chaos experiment to separate experiments (#594)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-14 11:30:41 +05:30
Shubham Chaudhary 7140565204
chore(sudo): fixing sudo command (#595)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-07 21:03:09 +05:30
Shubham Chaudhary 920c62d032
fix(dns-chaos): fixing the dns helper logs (#589)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-10-13 19:14:00 +05:30
Neelanjan Manna 7273da979a
Update google apis for GCP experiments and adds DefaultHealthCheck for GCP experiments (#580)
* updated probe default health check for GCP experiments

Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>
2022-10-12 16:30:30 +05:30
Ashis Kumar Naik 35bb75fda9
Remove Redundant Steady-State Checks in GCP VM Instance Stop experiment (#554) (#585)
*Removed the redundant sanity checks in GCP VM instance stop experiment in chaoslib which were originally also defined in the steady state check function for the experiment.

Signed-off-by: Ashis Kumar Naik <ashishami2002@gmail.com>
2022-10-12 09:45:21 +05:30
Neelanjan Manna e8ec4bd0df
fix(Experiment): Add status logs for GCP experiments (#583)
* added status logs to GCP experiments

Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>
2022-10-11 10:29:05 +05:30
Shubham Chaudhary f80413639c
feat(dns-chaos): Adding containerd support for dns-chaos (#577)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-10-05 21:56:24 +05:30
Akash Shrivastava ce0ccb5cf8
added default healthcheck condition; Removed redundant code (#579)
Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
2022-10-05 19:54:46 +05:30
Shubham Chaudhary 45b79a8916
chore(httpchaos): Adding support for serviceMesh (#578)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-10-05 14:57:19 +05:30
Udit Gaurav d4e05dbeb7
Chore(checks): Makes the default health check tunable and remove AUT and Aux checks from infra experiments (#576)
Signed-off-by: uditgaurav <udit@chaosnative.com>

Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-10-01 00:37:58 +05:30
Udit Gaurav b69ed69aab
Chore(cmd-probe): Use experiment envs and volume in probe pod (#572)
* Chore(cmd-probe): Use experiment envs and volume in probe pod

Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-09-29 18:44:00 +05:30
Shubham Chaudhary 5c09e5a36e
feat(ports): Adding source and destination ports support in network experiments (#570)
* feat(ports): Adding source and destination ports support in network experiments

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* Update chaoslib/litmus/network-chaos/helper/netem.go
2022-09-29 17:47:43 +05:30
Shubham Chaudhary 1ec871a62e
chore(httpProbe): Remove responseTimeout field, use the global probeTimeout field instead (#574)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-29 17:08:52 +05:30
Ashis Kumar Naik 50920bef44
Add missing Jitter ENV input for Pod Network Chaos experiments (#563) (#573)
*added the missing JITTER ENV for the network latency experiment to the experiment structure

*updated the default value of Network Latency to 2000 ms

Signed -off-by:  Ashis kumar Naik <ashishami2002@gmail.com>
2022-09-29 11:15:39 +05:30
Shubham Chaudhary 25d81a302a
update(sdk): updating operator sdk version (#571)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-27 16:57:26 +05:30
Saptarshi Sarkar c6a153f1fe
Updated README.md file (#559)
* Updated README.md file

Added link to the `License` file.
2022-09-26 11:11:43 +05:30
Tanmay Pandey 805af4f4bc
Fix helper pod issue for Kubelet Experiment (#543)
* Fix helper pod issue for Kubelet Experiment
Signed-off-by: Tanmay Pandey <tanmaypandey1998@gmail.com>
2022-09-26 10:46:37 +05:30
Shubham Chaudhary a83f346ea6
fix(stress): kill the stress process for abort (#569)
* fix(stress): kill the stress process for abort

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-23 16:14:51 +05:30
Shubham Chaudhary af535c19cc
fix(probe): Resiliency Score reaches more than 100 % with Probe failure (#568)
* chore(probe): remove ambiguous attribute phase

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* chore(probe): handle edge mode when probe failed in prechaos phase but passed in postchaos phase with stopOnfailure set to false

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* chore(probe): fixed the probeSuccessPercentage > 100 issue

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* add the failstep for probe failures

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* fixing onchaos probe to run only for chaos duration

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-21 11:59:43 +05:30
Chinmay Mehta 703f507461
Optimized the logic for duplicate IP check (#565)
Signed-off-by: chinmaym07 <b418020@iiit-bh.ac.in>
2022-09-21 10:37:51 +05:30
Shubham Chaudhary 84854b7851
fix(probe): Converting probeStatus as enum (#566)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-20 18:46:57 +05:30
Shubham Chaudhary f196f763a1
fix(abort): fixing chaosresult annotation conflict while updating chaosresult for abort scenarios (#567)
* fix(result): fix chaosresult update conflict issue

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-20 17:05:36 +05:30
Stéphane Cazeaux b87712921f
Added experiment and lib for spring-boot (#511)
* Added experiment and lib for spring-boot

Signed-off-by: Stéphane Cazeaux <stephane.cazeaux@orange.com>
2022-09-20 14:53:11 +05:30
Akash Shrivastava e3c0492a61
Response Body modification in HTTP Status code experiment (#556)
* added response body in status code; Added content encoding and type in body and status; Removed unnecessary logging

Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
2022-09-15 18:26:56 +05:30
Udit Gaurav f82e0357af
Chore(sdk): Adds SDK Template for Cloud based experiments (#560)
* Chore(sdk): Adds SDK Template for Cloud based experiments

Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>
2022-09-15 17:37:28 +05:30
Udit Gaurav 3f0d50813b
Chore(capability): Remove extra capabilities from stress chaos experiments (#557)
* Chore(capability): Remove extra capanilities from stress chaos experiments

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Update run-e2e-on-pr-commits.yml

* Update stress-chaos.go

Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-09-14 18:00:23 +05:30
Shubham Chaudhary 0bae5bec27
Deriving podIps of the pods for k8s service if target pod has serviceMesh sidecar (#558)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-12 16:19:09 +05:30
Shubham Chaudhary 718e8a8f18
chore(status): Handling terminated containers (#552)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-05 12:06:35 +05:30
Shubham Chaudhary 158c9a8f63
chore(sdk): Adding service account and helper pod failure check (#553)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-05 12:05:51 +05:30
Shubham Chaudhary f3203a8692
chore(history): Converting history field to pointer (#550)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-08-29 12:22:04 +05:30
Akash Shrivastava 671c5e04b8
Added support for status code list in HTTP Chaos (#545)
* Added support for selecting random code from list of codes in status code

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Random code logic fix

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* log improvement

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

Signed-off-by: Akash Shrivastava <as86414@gmail.com>
Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2022-08-16 15:02:52 +05:30
Akash Shrivastava ad6b97f05d
Added toxicity support in HTTP chaos experiments (#544)
* Added toxicity support in HTTP chaos experiments

* Fixed issue with helper not reading toxicity env

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

Signed-off-by: Akash Shrivastava <as86414@gmail.com>
Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2022-08-16 15:02:24 +05:30
Udit Gaurav dc62a12af1
Fix(pipeline): Fixes e2e pipeline check (#549)
Signed-off-by: uditgaurav <udit@chaosnative.com>

Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-08-16 14:34:08 +05:30
Shubham Chaudhary 06312c8893
chore(cmdProbe): Adding imagePullSecrets source cmdProbe (#547)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-08-04 19:46:16 +05:30
Shubham Chaudhary f402bf8f08
chore(sdk): Adding support for helper based chaoslib (#546)
* chore(sdk): Adding support for helper based chaoslib

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* chore(sdk): Adding support for helper based chaoslib

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: ispeakc0de <ashubham314@gmail.com>

* Update contribute/developer-guide/README.md

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2022-08-03 14:10:46 +05:30
Akash Shrivastava 535c1e7d05
Chore[New exp]: HTTP Modify Status Code experiment for K8s (#539)
* Added base code for http status code experiment

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Minor fixes in toxiproxy args

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added appns etc in test.yaml

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Update experiments/generic/pod-http-status-code/experiment/pod-http-status-code.go

Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io>

* Added httpchaostype var

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Removed httpchaostype var and moved log into chaos type specific files

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* added check for status code

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* restructured code; fixed random logic; improved logs

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* changed logic for ModifyResponseBody conversion

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* minor readme fix

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io>
2022-07-15 16:36:41 +05:30
Akash Shrivastava acdbe8126e
Chore[New exp]: HTTP Modify Headers experiment for K8s (#541)
* Added http modify header experiment

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added entry in experiment

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Minor fixes in toxiproxy args

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* moved tunables logs to specific lib file

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* improved code

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* fixed issues in comments

Signed-off-by: Akash Shrivastava <as86414@gmail.com>
2022-07-15 12:04:24 +05:30
Akash Shrivastava af6be30fbd
Chore[New exp]: HTTP Modify Body experiment for K8s (#540)
* Added http modify body experiment

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* fixed issue with toxic command

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* fixed log issue

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* moved tunables logs to specific lib file

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* updated operator path

Signed-off-by: Akash Shrivastava <as86414@gmail.com>
2022-07-15 00:26:42 +05:30
Kale Oum Nivrathi 8222832d3d
chore(Probes): Probe enhancements for cmdProbe as a source (#471)
* chore(Probes): Probe enhancements for cmdProbe as a source

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-07-14 20:51:38 +05:30
Shubham Chaudhary 01bc4d93aa
Updating litmus-client and k8s version to 1.21.2 (#542)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-07-14 19:27:33 +05:30
Akash Shrivastava 37748de56c
Chore[New exp]: HTTP Reset Peer experiment for K8s (#534)
* Added pod-http-reset-peer experiment code

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added to experiment main; Improved and cleaned code; Improved logs

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added rbac and readme

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Update experiments/generic/pod-http-reset-peer/experiment/pod-http-reset-peer.go

Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io>

* moved tunables logs to specific lib file

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* removed unused check

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io>
2022-07-14 18:01:49 +05:30
Gonzalo Reyero Ferreras 9c6c7d1e42
Add missing return to AnnotatedApplicationsStatusCheck (#533)
Signed-off-by: Gonzalo Reyero Ferreras <greyerof@redhat.com>
2022-06-27 18:37:23 +05:30
Udit Gaurav 335b0d064a
Fix node level e2e pipeline to run the ci tests (#529)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-06-15 13:49:35 +05:30
Akash Shrivastava 3b04bfab25
Chore[New exp]: HTTP Chaos for K8s (#524)
* Added base code for httpchaos

Signed-off-by: Vedant Shrotria <vedant.shrotria@harness.io>

* Renamed files; Removed unused env vars; Added and restructures env vars; Restructured helper code; Restructured lib code

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added support for sethelperdata env

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* fixed filename; Improved and cleaned logs

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Restructured to pass argument through separate lib file for new http experiment, no changes in helper lib required

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added exit checks in abort retries; Improved kill proxy; Added kill proxy if start proxy fails

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Improved logs for getcontainerid

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Changed retrying logic

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added readme, test.yaml and rbac.yaml; Fixed gofmt issue in helper.go

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Removed target_host env

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Improved error logging; Improved revert process

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Removed toxiproxy from dockerfile; Improved logs and comment

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added network interface tunable; Made getContainerID runtime based as a standard function

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Changed TARGET_PORT->TARGET_SERVICE_PORT, LISTEN_PORT->PROXY_PORT

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io>
Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2022-06-14 17:15:46 +05:30
Akash Shrivastava e32af9a434
Chore[Fix]: Node uncordon when app status check failed inside lib (#526)
* Added uncordon step when app status check fails

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Fixed error var issue; Changed deprecated flag from node drain command

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added logs for revert chaos when aut and auxapp check fails

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added [Revert] tag in logs

Signed-off-by: Akash Shrivastava <as86414@gmail.com>
2022-06-14 11:11:00 +05:30
Jimmy Zhang a6435a6bd1
Add --stress-image for stressArgs for pumba lib (#521)
Signed-off-by: Jimmy Zhang <zhang.artur@gmail.com>
2022-06-01 12:54:19 +05:30
Neelanjan Manna 151ca50fe7
added ChaosResult verdict updation step (#523)
Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>
2022-06-01 12:53:32 +05:30
Udit Gaurav 111534cf32
Chore(helper pod): Make setHelper data as tunable (#519)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-05-13 09:23:54 +05:30
Akash Shrivastava ffc96c09ae
return error if node not present (#516)
Signed-off-by: Akash Shrivastava <akash@chaosnative.com>
2022-05-11 21:31:37 +05:30
Neelanjan Manna 940c7ffa30
updated appns podlist filtering error handling (#515)
Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io>
2022-05-10 19:00:57 +05:30
Akash Shrivastava a7694af725
Added Active Node Count Check using AWS APIs (#500)
* Added node count check using aws apis

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Added node count check using aws apis to instance terminate by tag experiment

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Log improvements; Code improvement in findActiveNodeCount function;

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Added log for instance status check failed in find active node count

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Added check if active node count is less than provided instance ids

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>
2022-05-10 15:48:58 +05:30
Soumya Ghosh Dastidar 8d43271bd2
fix: updated release workflow (#512)
Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>
2022-05-10 15:19:45 +05:30
Shubham Chaudhary 973bb0ea1c
update(sdk): updating litmus sdk for the defaultAppHealthCheck (#513)
Signed-off-by: shubhamc <shubhamc@jfrog.com>

Co-authored-by: shubhamc <shubhamc@jfrog.com>
2022-05-10 15:19:12 +05:30
Neelanjan Manna 817f4d6199
GCP Experiments Refactor, New Label Selector Experiments and IAM Integration (#495)
* experiment init

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* updated experiment file

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* updated experiment lib

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* updated post chaos validation

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* updated empty slices to nil, updated experiment name in environment.go

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* removed experiment charts

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* bootstrapped gcp-vm-disk-loss-by-label artiacts

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* removed device-names input for gcp-vm-disk-loss experiment, added API calls to derive device name internally

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* removed redundant condition check in gcp-vm-disk-loss experiment pre-requisite checks

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* reformatted error messages

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* replaced the SetTargetInstances function

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* added settargetdisk function for getting target disk names using label

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* refactored Target Disk Attached VM Instance memorisation, updated vm-disk-loss and added lib logic for vm-disk-loss-by-label experiment

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* added experiment to bin and cleared default experiment name in environment.go

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* removed charts

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* updated test.yml

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* updated AutoScalingGroup to ManagedInstanceGroup; updated logic for checking InstanceStop recovery for ManagedInstanceGroup VMs; Updated log and error messages with VM names

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* removed redundant computeService code snippets

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* removed redundant computeService code snippets in gcp-disk-loss experiments

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* updated logic for deriving default gcp sa credentials for computeService

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* updated logging for IAM integration

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* refactored log and error messages and wait for start/stop instances logic

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* fixed logs, optimised control statements, added comments, corrected experiment names

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* fixed file exists check logic

Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>

* updated instance and device name fetch logic for disk loss

Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>

* updated logs

Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>
2022-04-28 17:54:28 +05:30
Udit Gaurav 85733418d2
Chore(ssm): Update the ssm file path in the Dockerfile (#508)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-04-16 22:42:07 +05:30
Udit Gaurav 6fcb641cca
Chore(warn): Remove warning Neither --kubeconfig nor --master was specified for InClusterConfig (#507)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-04-16 12:46:11 +05:30
Udit Gaurav 0cb4d22e2d
Fix(container-kill): Adds statusCheckTimeout to container kill recovery (#499)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-04-14 13:15:32 +05:30
Udit Gaurav 7d7adcbef7
Fix(container-kill): Adds statusCheckTimeout to container kill recovery (#498)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-04-14 10:10:21 +05:30
Udit Gaurav 1b894e57fc
Fix(targetContainer): Incorrect target container passed in the helper pod for pod level experiments (#496)
* Fix target container issue

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Fix target container issue

Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-03-28 06:56:06 +05:30
Udit Gaurav 433e40d2fb
(enahncement)experiment: add node label filter for pod network and stress chaos (#494)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-03-16 16:06:10 +05:30
Udit Gaurav 8a63701113
Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill (#493)
* Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill

Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-03-15 22:31:01 +05:30
Udit Gaurav 0175a3ce90
Chore(randomize): Randomize stress-chaos tunables (#487)
* Chore(randomize): Randomize stress-chaos tunables

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Update stress-chaos.go
2022-03-15 22:21:21 +05:30
Udit Gaurav 8421105b47
Chore(network-chaos): Randomize Chaos Tunables for Netowork Chaos Experiment (#491)
* Chore(network-chaos):

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Chore(network-chaos): Randomize Chaos Tunables for Netowork Chaos Experiment

Signed-off-by: uditgaurav <udit@chaosnative.com>

Co-authored-by: Karthik Satchitanand <karthik.s@mayadata.io>
2022-03-15 14:28:46 +05:30
Udit Gaurav 4e7877bb92
Chore(snyk): Fix snyk security scan on litmus-go (#492)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-03-15 08:12:22 +05:30
Udit Gaurav 1ee2680988
Chore(cgroup): Add support for cgroup version2 in stress-chaos experiment (#490)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-03-14 16:05:47 +05:30
Udit Gaurav 271de31ce2
Chore(vulnerability): Remove openebs retry module and update pkgs (#488)
* Chore(vulnerability): Fix some vulnerability by updaing the pkgs

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Chore(vulnerability): Remove openebs retry module and update pkgs

Signed-off-by: udit <udit@chaosnative.com>
2022-03-03 18:24:21 +05:30
Raj Babu Das f12b0b4bb5
Fixeing alpine CVEs by upgrading the version (#486) 2022-02-21 21:24:42 +05:30
355 changed files with 20786 additions and 11707 deletions

View File

@ -12,19 +12,12 @@ jobs:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: 1.16
go-version: '1.20'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
#TODO: Add Dockerfile linting
# Running go-lint
- name: Checking Go-Lint
run : |
sudo apt-get update && sudo apt-get install golint
make gotasks
- name: gofmt check
run: |
if [ "$(gofmt -s -l . | wc -l)" -ne 0 ]
@ -33,20 +26,21 @@ jobs:
gofmt -s -l .
exit 1
fi
- name: golangci-lint
uses: reviewdog/action-golangci-lint@v1
security:
- name: golangci-lint
uses: reviewdog/action-golangci-lint@v1
gitleaks-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@master
- name: Run Snyk to check for vulnerabilities
uses: snyk/actions/golang@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
- uses: actions/checkout@v3
with:
args: --severity-threshold=high
fetch-depth: 0
- name: Run GitLeaks
run: |
wget https://github.com/gitleaks/gitleaks/releases/download/v8.18.2/gitleaks_8.18.2_linux_x64.tar.gz && \
tar -zxvf gitleaks_8.18.2_linux_x64.tar.gz && \
sudo mv gitleaks /usr/local/bin && gitleaks detect --source . -v
build:
needs: pre-checks
@ -55,7 +49,7 @@ jobs:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: 1.16
go-version: '1.20'
- uses: actions/checkout@v2
with:
@ -79,6 +73,7 @@ jobs:
file: build/Dockerfile
platforms: linux/amd64,linux/arm64
tags: litmuschaos/go-runner:ci
build-args: LITMUS_VERSION=3.10.0
trivy:
needs: pre-checks
@ -90,8 +85,8 @@ jobs:
- name: Build an image from Dockerfile
run: |
docker build -f build/Dockerfile -t docker.io/litmuschaos/go-runner:${{ github.sha }} . --build-arg TARGETARCH=amd64
docker build -f build/Dockerfile -t docker.io/litmuschaos/go-runner:${{ github.sha }} . --build-arg TARGETARCH=amd64 --build-arg LITMUS_VERSION=3.10.0
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
@ -100,4 +95,4 @@ jobs:
exit-code: '1'
ignore-unfixed: true
vuln-type: 'os,library'
severity: 'CRITICAL,HIGH'
severity: 'CRITICAL,HIGH'

View File

@ -13,16 +13,9 @@ jobs:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: 1.16
go-version: '1.20'
- uses: actions/checkout@v2
#TODO: Add Dockerfile linting
# Running go-lint
- name: Checking Go-Lint
run : |
sudo apt-get update && sudo apt-get install golint
make gotasks
- name: gofmt check
run: |
if [ "$(gofmt -s -l . | wc -l)" -ne 0 ]
@ -31,9 +24,9 @@ jobs:
gofmt -s -l .
exit 1
fi
- name: golangci-lint
uses: reviewdog/action-golangci-lint@v1
uses: reviewdog/action-golangci-lint@v1
push:
needs: pre-checks
@ -43,7 +36,7 @@ jobs:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: 1.16
go-version: '1.20'
- uses: actions/checkout@v2
- name: Set up QEMU
@ -70,3 +63,4 @@ jobs:
file: build/Dockerfile
platforms: linux/amd64,linux/arm64
tags: litmuschaos/go-runner:ci
build-args: LITMUS_VERSION=3.10.0

View File

@ -12,15 +12,9 @@ jobs:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: 1.16
go-version: '1.20'
- uses: actions/checkout@v2
#TODO: Add Dockerfile linting
# Running go-lint
- name: Checking Go-Lint
run : |
sudo apt-get update && sudo apt-get install golint
make gotasks
push:
needs: pre-checks
runs-on: ubuntu-latest
@ -28,7 +22,7 @@ jobs:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: 1.16
go-version: '1.20'
- uses: actions/checkout@v2
- name: Set Tag
@ -41,7 +35,7 @@ jobs:
run: |
echo "RELEASE TAG: ${RELEASE_TAG}"
echo "${RELEASE_TAG}" > ${{ github.workspace }}/tag.txt
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
with:
@ -61,10 +55,11 @@ jobs:
- name: Build and push
uses: docker/build-push-action@v2
env:
env:
RELEASE_TAG: ${{ env.RELEASE_TAG }}
with:
push: true
file: build/Dockerfile
platforms: linux/amd64,linux/arm64
tags: litmuschaos/go-runner:${{ env.RELEASE_TAG }},litmuschaos/go-runner:latest
build-args: LITMUS_VERSION=3.10.0

View File

@ -9,215 +9,15 @@ on:
- '**.yaml'
jobs:
# Helm_Install_Generic_Tests:
# runs-on: ubuntu-18.04
# steps:
# - uses: actions/checkout@v2
# with:
# ref: ${{ github.event.pull_request.head.sha }}
# - name: Generate go binary and build docker image
# run: make build-amd64
# #Install and configure a kind cluster
# - name: Installing KinD cluster for the test
# uses: engineerd/setup-kind@v0.5.0
# with:
# version: "v0.7.0"
# config: "build/kind-cluster/kind-config.yaml"
# - name: Configuring and testing the Installation
# run: |
# kubectl taint nodes kind-control-plane node-role.kubernetes.io/master-
# kind get kubeconfig --internal >$HOME/.kube/config
# kubectl cluster-info --context kind-kind
# kubectl get nodes
# - name: Load docker image
# run: /usr/local/bin/kind load docker-image litmuschaos/go-runner:ci
# - name: Deploy a sample application for chaos injection
# run: |
# kubectl apply -f https://raw.githubusercontent.com/litmuschaos/chaos-ci-lib/master/app/nginx.yml
# kubectl wait --for=condition=Ready pods --all --namespace default --timeout=90s
# - name: Setting up kubeconfig ENV for Github Chaos Action
# run: echo ::set-env name=KUBE_CONFIG_DATA::$(base64 -w 0 ~/.kube/config)
# env:
# ACTIONS_ALLOW_UNSECURE_COMMANDS: true
# - name: Setup Litmus
# uses: litmuschaos/github-chaos-actions@master
# env:
# INSTALL_LITMUS: true
# - name: Running Litmus pod delete chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-delete
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# - name: Running container kill chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: container-kill
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# CONTAINER_RUNTIME: containerd
# - name: Running node-cpu-hog chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: node-cpu-hog
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# - name: Running node-memory-hog chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: node-memory-hog
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# - name: Running pod-cpu-hog chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-cpu-hog
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# CPU_CORES: 1
# - name: Running pod-memory-hog chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-memory-hog
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# MEMORY_CONSUMPTION: 500
# - name: Running pod network corruption chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-network-corruption
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# NETWORK_INTERFACE: eth0
# CONTAINER_RUNTIME: containerd
# - name: Running pod network duplication chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-network-duplication
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# NETWORK_INTERFACE: eth0
# CONTAINER_RUNTIME: containerd
# - name: Running pod-network-latency chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-network-latency
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# NETWORK_INTERFACE: eth0
# NETWORK_LATENCY: 60000
# CONTAINER_RUNTIME: containerd
# - name: Running pod-network-loss chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-network-loss
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# NETWORK_INTERFACE: eth0
# NETWORK_PACKET_LOSS_PERCENTAGE: 100
# CONTAINER_RUNTIME: containerd
# - name: Running pod autoscaler chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-autoscaler
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TOTAL_CHAOS_DURATION: 60
# - name: Running node-io-stress chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: node-io-stress
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TOTAL_CHAOS_DURATION: 120
# FILESYSTEM_UTILIZATION_PERCENTAGE: 10
# - name: Uninstall Litmus
# uses: litmuschaos/github-chaos-actions@master
# env:
# LITMUS_CLEANUP: true
# - name: Deleting KinD cluster
# if: always()
# run: kind delete cluster
Pod_Level_In_Serial_Mode:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
- uses: actions/setup-go@v5
with:
go-version: '1.16'
go-version: '1.20'
- uses: actions/checkout@v2
with:
@ -226,15 +26,28 @@ jobs:
- name: Generating Go binary and Building docker image
run: |
make build-amd64
#Install and configure a kind cluster
- name: Installing Prerequisites (K3S Cluster)
env:
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
- name: Install KinD
run: |
curl -sfL https://get.k3s.io | sh -s - --docker --write-kubeconfig-mode 664
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
mv ./kind /usr/local/bin/kind
- name: Create KinD Cluster
run: |
kind create cluster --config build/kind-cluster/kind-config.yaml
- name: Configuring and testing the Installation
run: |
kubectl taint nodes kind-control-plane node-role.kubernetes.io/control-plane-
kubectl cluster-info --context kind-kind
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
- uses: actions/checkout@v2
with:
repository: 'litmuschaos/litmus-e2e'
@ -244,23 +57,24 @@ jobs:
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
KUBECONFIG: /home/runner/.kube/config
run: |
make build-litmus
make app-deploy
make pod-affected-perc-ton-series
- name: Deleting K3S cluster
- name: Deleting KinD cluster
if: always()
run: /usr/local/bin/k3s-uninstall.sh
run: kind delete cluster
Pod_Level_In_Parallel_Mode:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
- uses: actions/setup-go@v5
with:
go-version: '1.16'
go-version: '1.20'
- uses: actions/checkout@v2
with:
@ -269,14 +83,30 @@ jobs:
- name: Generating Go binary and Building docker image
run: |
make build-amd64
#Install and configure a kind cluster
- name: Installing Prerequisites (K3S Cluster)
env:
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
- name: Install KinD
run: |
curl -sfL https://get.k3s.io | sh -s - --docker --write-kubeconfig-mode 664
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
mv ./kind /usr/local/bin/kind
- name: Create KinD Cluster
run: |
kind create cluster --config build/kind-cluster/kind-config.yaml
- name: Configuring and testing the Installation
env:
KUBECONFIG: /home/runner/.kube/config
run: |
kubectl taint nodes kind-control-plane node-role.kubernetes.io/control-plane-
kubectl cluster-info --context kind-kind
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
- uses: actions/checkout@v2
with:
repository: 'litmuschaos/litmus-e2e'
@ -286,23 +116,24 @@ jobs:
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
KUBECONFIG: /home/runner/.kube/config
run: |
make build-litmus
make app-deploy
make pod-affected-perc-ton-parallel
- name: Deleting K3S cluster
- name: Deleting KinD cluster
if: always()
run: /usr/local/bin/k3s-uninstall.sh
run: kind delete cluster
Node_Level_Tests:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
- uses: actions/setup-go@v5
with:
go-version: '1.16'
go-version: '1.20'
- uses: actions/checkout@v2
with:
@ -312,19 +143,26 @@ jobs:
run: |
make build-amd64
- name: Install KinD
run: |
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
mv ./kind /usr/local/bin/kind
- name: Create KinD Cluster
run: kind create cluster --config build/kind-cluster/kind-config.yaml
run: |
kind create cluster --config build/kind-cluster/kind-config.yaml
- name: Configuring and testing the Installation
run: |
kubectl taint nodes kind-control-plane node-role.kubernetes.io/master-
kubectl taint nodes kind-control-plane node-role.kubernetes.io/control-plane-
kubectl cluster-info --context kind-kind
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
kind load docker-image --name=kind litmuschaos/go-runner:ci
- uses: actions/checkout@v2
with:
@ -355,4 +193,6 @@ jobs:
- name: Deleting KinD cluster
if: always()
run: kind delete cluster
run: |
kubectl get nodes
kind delete cluster

27
.github/workflows/security-scan.yml vendored Normal file
View File

@ -0,0 +1,27 @@
---
name: Security Scan
on:
workflow_dispatch:
jobs:
trivy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Build an image from Dockerfile
run: |
docker build -f build/Dockerfile -t docker.io/litmuschaos/go-runner:${{ github.sha }} . --build-arg TARGETARCH=amd64 --build-arg LITMUS_VERSION=3.9.0
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: 'docker.io/litmuschaos/go-runner:${{ github.sha }}'
format: 'table'
exit-code: '1'
ignore-unfixed: true
vuln-type: 'os,library'
severity: 'CRITICAL,HIGH'

View File

@ -31,7 +31,7 @@ deps: _build_check_docker
_build_check_docker:
@echo "------------------"
@echo "--> Check the Docker deps"
@echo "--> Check the Docker deps"
@echo "------------------"
@if [ $(IS_DOCKER_INSTALLED) -eq 1 ]; \
then echo "" \
@ -56,7 +56,7 @@ unused-package-check:
.PHONY: docker.buildx
docker.buildx:
@echo "------------------------------"
@echo "--> Setting up Builder "
@echo "--> Setting up Builder "
@echo "------------------------------"
@if ! docker buildx ls | grep -q multibuilder; then\
docker buildx create --name multibuilder;\
@ -69,27 +69,27 @@ push: docker.buildx image-push
image-push:
@echo "------------------------"
@echo "--> Push go-runner image"
@echo "--> Push go-runner image"
@echo "------------------------"
@echo "Pushing $(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)"
@docker buildx build . --push --file build/Dockerfile --progress plane --platform linux/arm64,linux/amd64 --no-cache --tag $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
@docker buildx build . --push --file build/Dockerfile --progress plain --platform linux/arm64,linux/amd64 --no-cache --tag $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
.PHONY: build-amd64
build-amd64:
@echo "-------------------------"
@echo "--> Build go-runner image"
@echo "--> Build go-runner image"
@echo "-------------------------"
@sudo docker build --file build/Dockerfile --tag $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG) . --build-arg TARGETARCH=amd64
@sudo docker build --file build/Dockerfile --tag $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG) . --build-arg TARGETARCH=amd64 --build-arg LITMUS_VERSION=3.9.0
.PHONY: push-amd64
push-amd64:
@echo "------------------------------"
@echo "--> Pushing image"
@echo "--> Pushing image"
@echo "------------------------------"
@sudo docker push $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
.PHONY: trivy-check
trivy-check:

View File

@ -34,4 +34,7 @@ You can contribute by raising issues, improving the documentation, contributing
Head over to the [Contribution guide](CONTRIBUTING.md)
## License
Here is a copy of the License: [`License`](LICENSE)
## License Status and Vulnerability Check
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Flitmuschaos%2Flitmus-go.svg?type=large)](https://app.fossa.io/projects/git%2Bgithub.com%2Flitmuschaos%2Flitmus-go?ref=badge_large)

View File

@ -1,7 +1,11 @@
package main
import (
"context"
"errors"
"flag"
"os"
// Uncomment to load all auth plugins
// _ "k8s.io/client-go/plugin/pkg/client/auth"
@ -11,13 +15,17 @@ import (
// _ "k8s.io/client-go/plugin/pkg/client/auth/oidc"
// _ "k8s.io/client-go/plugin/pkg/client/auth/openstack"
"go.opentelemetry.io/otel"
awsSSMChaosByID "github.com/litmuschaos/litmus-go/experiments/aws-ssm/aws-ssm-chaos-by-id/experiment"
awsSSMChaosByTag "github.com/litmuschaos/litmus-go/experiments/aws-ssm/aws-ssm-chaos-by-tag/experiment"
azureDiskLoss "github.com/litmuschaos/litmus-go/experiments/azure/azure-disk-loss/experiment"
azureInstanceStop "github.com/litmuschaos/litmus-go/experiments/azure/instance-stop/experiment"
redfishNodeRestart "github.com/litmuschaos/litmus-go/experiments/baremetal/redfish-node-restart/experiment"
cassandraPodDelete "github.com/litmuschaos/litmus-go/experiments/cassandra/pod-delete/experiment"
gcpVMDiskLossByLabel "github.com/litmuschaos/litmus-go/experiments/gcp/gcp-vm-disk-loss-by-label/experiment"
gcpVMDiskLoss "github.com/litmuschaos/litmus-go/experiments/gcp/gcp-vm-disk-loss/experiment"
gcpVMInstanceStopByLabel "github.com/litmuschaos/litmus-go/experiments/gcp/gcp-vm-instance-stop-by-label/experiment"
gcpVMInstanceStop "github.com/litmuschaos/litmus-go/experiments/gcp/gcp-vm-instance-stop/experiment"
containerKill "github.com/litmuschaos/litmus-go/experiments/generic/container-kill/experiment"
diskFill "github.com/litmuschaos/litmus-go/experiments/generic/disk-fill/experiment"
@ -36,6 +44,11 @@ import (
podDNSError "github.com/litmuschaos/litmus-go/experiments/generic/pod-dns-error/experiment"
podDNSSpoof "github.com/litmuschaos/litmus-go/experiments/generic/pod-dns-spoof/experiment"
podFioStress "github.com/litmuschaos/litmus-go/experiments/generic/pod-fio-stress/experiment"
podHttpLatency "github.com/litmuschaos/litmus-go/experiments/generic/pod-http-latency/experiment"
podHttpModifyBody "github.com/litmuschaos/litmus-go/experiments/generic/pod-http-modify-body/experiment"
podHttpModifyHeader "github.com/litmuschaos/litmus-go/experiments/generic/pod-http-modify-header/experiment"
podHttpResetPeer "github.com/litmuschaos/litmus-go/experiments/generic/pod-http-reset-peer/experiment"
podHttpStatusCode "github.com/litmuschaos/litmus-go/experiments/generic/pod-http-status-code/experiment"
podIOStress "github.com/litmuschaos/litmus-go/experiments/generic/pod-io-stress/experiment"
podMemoryHogExec "github.com/litmuschaos/litmus-go/experiments/generic/pod-memory-hog-exec/experiment"
podMemoryHog "github.com/litmuschaos/litmus-go/experiments/generic/pod-memory-hog/experiment"
@ -44,15 +57,19 @@ import (
podNetworkLatency "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-latency/experiment"
podNetworkLoss "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-loss/experiment"
podNetworkPartition "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-partition/experiment"
podNetworkRateLimit "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-rate-limit/experiment"
kafkaBrokerPodFailure "github.com/litmuschaos/litmus-go/experiments/kafka/kafka-broker-pod-failure/experiment"
ebsLossByID "github.com/litmuschaos/litmus-go/experiments/kube-aws/ebs-loss-by-id/experiment"
ebsLossByTag "github.com/litmuschaos/litmus-go/experiments/kube-aws/ebs-loss-by-tag/experiment"
ec2TerminateByID "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate-by-id/experiment"
ec2TerminateByTag "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate-by-tag/experiment"
rdsInstanceStop "github.com/litmuschaos/litmus-go/experiments/kube-aws/rds-instance-stop/experiment"
k6Loadgen "github.com/litmuschaos/litmus-go/experiments/load/k6-loadgen/experiment"
springBootFaults "github.com/litmuschaos/litmus-go/experiments/spring-boot/spring-boot-faults/experiment"
vmpoweroff "github.com/litmuschaos/litmus-go/experiments/vmware/vm-poweroff/experiment"
"github.com/litmuschaos/litmus-go/pkg/clients"
cli "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/sirupsen/logrus"
)
@ -66,8 +83,25 @@ func init() {
}
func main() {
initCtx := context.Background()
clients := clients.ClientSets{}
// Set up Observability.
if otelExporterEndpoint := os.Getenv(telemetry.OTELExporterOTLPEndpoint); otelExporterEndpoint != "" {
shutdown, err := telemetry.InitOTelSDK(initCtx, true, otelExporterEndpoint)
if err != nil {
log.Errorf("Failed to initialize OTel SDK: %v", err)
return
}
defer func() {
err = errors.Join(err, shutdown(initCtx))
}()
initCtx = telemetry.GetTraceParentContext()
}
clients := cli.ClientSets{}
ctx, span := otel.Tracer(telemetry.TracerName).Start(initCtx, "ExecuteExperiment")
defer span.End()
// parse the experiment name
experimentName := flag.String("name", "pod-delete", "name of the chaos experiment")
@ -80,87 +114,108 @@ func main() {
log.Infof("Experiment Name: %v", *experimentName)
// invoke the corresponding experiment based on the the (-name) flag
// invoke the corresponding experiment based on the (-name) flag
switch *experimentName {
case "container-kill":
containerKill.ContainerKill(clients)
containerKill.ContainerKill(ctx, clients)
case "disk-fill":
diskFill.DiskFill(clients)
diskFill.DiskFill(ctx, clients)
case "kafka-broker-pod-failure":
kafkaBrokerPodFailure.KafkaBrokerPodFailure(clients)
kafkaBrokerPodFailure.KafkaBrokerPodFailure(ctx, clients)
case "kubelet-service-kill":
kubeletServiceKill.KubeletServiceKill(clients)
kubeletServiceKill.KubeletServiceKill(ctx, clients)
case "docker-service-kill":
dockerServiceKill.DockerServiceKill(clients)
dockerServiceKill.DockerServiceKill(ctx, clients)
case "node-cpu-hog":
nodeCPUHog.NodeCPUHog(clients)
nodeCPUHog.NodeCPUHog(ctx, clients)
case "node-drain":
nodeDrain.NodeDrain(clients)
nodeDrain.NodeDrain(ctx, clients)
case "node-io-stress":
nodeIOStress.NodeIOStress(clients)
nodeIOStress.NodeIOStress(ctx, clients)
case "node-memory-hog":
nodeMemoryHog.NodeMemoryHog(clients)
nodeMemoryHog.NodeMemoryHog(ctx, clients)
case "node-taint":
nodeTaint.NodeTaint(clients)
nodeTaint.NodeTaint(ctx, clients)
case "pod-autoscaler":
podAutoscaler.PodAutoscaler(clients)
podAutoscaler.PodAutoscaler(ctx, clients)
case "pod-cpu-hog-exec":
podCPUHogExec.PodCPUHogExec(clients)
podCPUHogExec.PodCPUHogExec(ctx, clients)
case "pod-delete":
podDelete.PodDelete(clients)
podDelete.PodDelete(ctx, clients)
case "pod-io-stress":
podIOStress.PodIOStress(clients)
podIOStress.PodIOStress(ctx, clients)
case "pod-memory-hog-exec":
podMemoryHogExec.PodMemoryHogExec(clients)
podMemoryHogExec.PodMemoryHogExec(ctx, clients)
case "pod-network-corruption":
podNetworkCorruption.PodNetworkCorruption(clients)
podNetworkCorruption.PodNetworkCorruption(ctx, clients)
case "pod-network-duplication":
podNetworkDuplication.PodNetworkDuplication(clients)
podNetworkDuplication.PodNetworkDuplication(ctx, clients)
case "pod-network-latency":
podNetworkLatency.PodNetworkLatency(clients)
podNetworkLatency.PodNetworkLatency(ctx, clients)
case "pod-network-loss":
podNetworkLoss.PodNetworkLoss(clients)
podNetworkLoss.PodNetworkLoss(ctx, clients)
case "pod-network-partition":
podNetworkPartition.PodNetworkPartition(clients)
podNetworkPartition.PodNetworkPartition(ctx, clients)
case "pod-network-rate-limit":
podNetworkRateLimit.PodNetworkRateLimit(ctx, clients)
case "pod-memory-hog":
podMemoryHog.PodMemoryHog(clients)
podMemoryHog.PodMemoryHog(ctx, clients)
case "pod-cpu-hog":
podCPUHog.PodCPUHog(clients)
podCPUHog.PodCPUHog(ctx, clients)
case "cassandra-pod-delete":
cassandraPodDelete.CasssandraPodDelete(clients)
cassandraPodDelete.CasssandraPodDelete(ctx, clients)
case "aws-ssm-chaos-by-id":
awsSSMChaosByID.AWSSSMChaosByID(clients)
awsSSMChaosByID.AWSSSMChaosByID(ctx, clients)
case "aws-ssm-chaos-by-tag":
awsSSMChaosByTag.AWSSSMChaosByTag(clients)
awsSSMChaosByTag.AWSSSMChaosByTag(ctx, clients)
case "ec2-terminate-by-id":
ec2TerminateByID.EC2TerminateByID(clients)
ec2TerminateByID.EC2TerminateByID(ctx, clients)
case "ec2-terminate-by-tag":
ec2TerminateByTag.EC2TerminateByTag(clients)
ec2TerminateByTag.EC2TerminateByTag(ctx, clients)
case "ebs-loss-by-id":
ebsLossByID.EBSLossByID(clients)
ebsLossByID.EBSLossByID(ctx, clients)
case "ebs-loss-by-tag":
ebsLossByTag.EBSLossByTag(clients)
ebsLossByTag.EBSLossByTag(ctx, clients)
case "rds-instance-stop":
rdsInstanceStop.RDSInstanceStop(ctx, clients)
case "node-restart":
nodeRestart.NodeRestart(clients)
nodeRestart.NodeRestart(ctx, clients)
case "pod-dns-error":
podDNSError.PodDNSError(clients)
podDNSError.PodDNSError(ctx, clients)
case "pod-dns-spoof":
podDNSSpoof.PodDNSSpoof(clients)
podDNSSpoof.PodDNSSpoof(ctx, clients)
case "pod-http-latency":
podHttpLatency.PodHttpLatency(ctx, clients)
case "pod-http-status-code":
podHttpStatusCode.PodHttpStatusCode(ctx, clients)
case "pod-http-modify-header":
podHttpModifyHeader.PodHttpModifyHeader(ctx, clients)
case "pod-http-modify-body":
podHttpModifyBody.PodHttpModifyBody(ctx, clients)
case "pod-http-reset-peer":
podHttpResetPeer.PodHttpResetPeer(ctx, clients)
case "vm-poweroff":
vmpoweroff.VMPoweroff(clients)
vmpoweroff.VMPoweroff(ctx, clients)
case "azure-instance-stop":
azureInstanceStop.AzureInstanceStop(clients)
azureInstanceStop.AzureInstanceStop(ctx, clients)
case "azure-disk-loss":
azureDiskLoss.AzureDiskLoss(clients)
azureDiskLoss.AzureDiskLoss(ctx, clients)
case "gcp-vm-disk-loss":
gcpVMDiskLoss.VMDiskLoss(clients)
gcpVMDiskLoss.VMDiskLoss(ctx, clients)
case "pod-fio-stress":
podFioStress.PodFioStress(clients)
podFioStress.PodFioStress(ctx, clients)
case "gcp-vm-instance-stop":
gcpVMInstanceStop.VMInstanceStop(clients)
gcpVMInstanceStop.VMInstanceStop(ctx, clients)
case "redfish-node-restart":
redfishNodeRestart.NodeRestart(clients)
redfishNodeRestart.NodeRestart(ctx, clients)
case "gcp-vm-instance-stop-by-label":
gcpVMInstanceStopByLabel.GCPVMInstanceStopByLabel(ctx, clients)
case "gcp-vm-disk-loss-by-label":
gcpVMDiskLossByLabel.GCPVMDiskLossByLabel(ctx, clients)
case "spring-boot-cpu-stress", "spring-boot-memory-stress", "spring-boot-exceptions", "spring-boot-app-kill", "spring-boot-faults", "spring-boot-latency":
springBootFaults.Experiment(ctx, clients, *experimentName)
case "k6-loadgen":
k6Loadgen.Experiment(ctx, clients)
default:
log.Errorf("Unsupported -name %v, please provide the correct value of -name args", *experimentName)
return

View File

@ -1,7 +1,11 @@
package main
import (
"context"
"errors"
"flag"
"os"
// Uncomment to load all auth plugins
// _ "k8s.io/client-go/plugin/pkg/client/auth"
@ -13,13 +17,15 @@ import (
containerKill "github.com/litmuschaos/litmus-go/chaoslib/litmus/container-kill/helper"
diskFill "github.com/litmuschaos/litmus-go/chaoslib/litmus/disk-fill/helper"
httpChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/helper"
networkChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/helper"
dnsChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/pod-dns-chaos/helper"
stressChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/stress-chaos/helper"
"github.com/litmuschaos/litmus-go/pkg/clients"
cli "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
func init() {
@ -32,8 +38,24 @@ func init() {
}
func main() {
ctx := context.Background()
// Set up Observability.
if otelExporterEndpoint := os.Getenv(telemetry.OTELExporterOTLPEndpoint); otelExporterEndpoint != "" {
shutdown, err := telemetry.InitOTelSDK(ctx, true, otelExporterEndpoint)
if err != nil {
log.Errorf("Failed to initialize OTel SDK: %v", err)
return
}
defer func() {
err = errors.Join(err, shutdown(ctx))
}()
ctx = telemetry.GetTraceParentContext()
}
clients := clients.ClientSets{}
clients := cli.ClientSets{}
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "ExecuteExperimentHelper")
defer span.End()
// parse the helper name
helperName := flag.String("name", "", "name of the helper pod")
@ -49,15 +71,17 @@ func main() {
// invoke the corresponding helper based on the the (-name) flag
switch *helperName {
case "container-kill":
containerKill.Helper(clients)
containerKill.Helper(ctx, clients)
case "disk-fill":
diskFill.Helper(clients)
diskFill.Helper(ctx, clients)
case "dns-chaos":
dnsChaos.Helper(clients)
dnsChaos.Helper(ctx, clients)
case "stress-chaos":
stressChaos.Helper(clients)
stressChaos.Helper(ctx, clients)
case "network-chaos":
networkChaos.Helper(clients)
networkChaos.Helper(ctx, clients)
case "http-chaos":
httpChaos.Helper(ctx, clients)
default:
log.Errorf("Unsupported -name %v, please provide the correct value of -name args", *helperName)

View File

@ -1,6 +1,6 @@
# Multi-stage docker build
# Build stage
FROM golang:alpine AS builder
FROM golang:1.22 AS builder
ARG TARGETOS=linux
ARG TARGETARCH
@ -14,23 +14,99 @@ RUN export GOOS=${TARGETOS} && \
RUN CGO_ENABLED=0 go build -o /output/experiments ./bin/experiment
RUN CGO_ENABLED=0 go build -o /output/helpers ./bin/helper
FROM alpine:3.14.2 AS dep
# Install generally useful things
RUN apk --update add \
sudo \
iproute2
# Packaging stage
# Image source: https://github.com/litmuschaos/test-tools/blob/master/custom/hardened-alpine/experiment/Dockerfile
# The base image is non-root (have litmus user) with default litmus directory.
FROM litmuschaos/experiment-alpine
FROM registry.access.redhat.com/ubi9/ubi:9.4
LABEL maintainer="LitmusChaos"
COPY --from=builder /output/ /litmus
COPY --from=dep /usr/bin/sudo /usr/bin/
COPY --from=dep /sbin/tc /sbin/
ARG TARGETARCH
ARG LITMUS_VERSION
#Copying Necessary Files
COPY ./pkg/cloud/aws/common/ssm-docs/LitmusChaos-AWS-SSM-Docs.yml ./litmus/LitmusChaos-AWS-SSM-Docs.yml
# Install generally useful things
RUN yum install -y \
sudo \
sshpass \
procps \
openssh-clients
# tc binary
RUN yum install -y https://dl.rockylinux.org/vault/rocky/9.3/devel/$(uname -m)/os/Packages/i/iproute-6.2.0-5.el9.$(uname -m).rpm
RUN yum install -y https://dl.rockylinux.org/vault/rocky/9.3/devel/$(uname -m)/os/Packages/i/iproute-tc-6.2.0-5.el9.$(uname -m).rpm
# iptables
RUN yum install -y https://dl.rockylinux.org/vault/rocky/9.3/devel/$(uname -m)/os/Packages/i/iptables-libs-1.8.8-6.el9_1.$(uname -m).rpm
RUN yum install -y https://dl.fedoraproject.org/pub/archive/epel/9.3/Everything/$(uname -m)/Packages/i/iptables-legacy-libs-1.8.8-6.el9.2.$(uname -m).rpm
RUN yum install -y https://dl.fedoraproject.org/pub/archive/epel/9.3/Everything/$(uname -m)/Packages/i/iptables-legacy-1.8.8-6.el9.2.$(uname -m).rpm
# stress-ng
RUN yum install -y https://yum.oracle.com/repo/OracleLinux/OL9/appstream/$(uname -m)/getPackage/Judy-1.0.5-28.el9.$(uname -m).rpm
RUN yum install -y https://yum.oracle.com/repo/OracleLinux/OL9/appstream/$(uname -m)/getPackage/stress-ng-0.14.00-2.el9.$(uname -m).rpm
#Installing Kubectl
ENV KUBE_LATEST_VERSION="v1.31.0"
RUN curl -L https://storage.googleapis.com/kubernetes-release/release/${KUBE_LATEST_VERSION}/bin/linux/${TARGETARCH}/kubectl -o /usr/bin/kubectl && \
chmod 755 /usr/bin/kubectl
#Installing crictl binaries
RUN curl -L https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.31.1/crictl-v1.31.1-linux-${TARGETARCH}.tar.gz --output crictl-v1.31.1-linux-${TARGETARCH}.tar.gz && \
tar zxvf crictl-v1.31.1-linux-${TARGETARCH}.tar.gz -C /sbin && \
chmod 755 /sbin/crictl
#Installing promql cli binaries
RUN curl -L https://github.com/chaosnative/promql-cli/releases/download/3.0.0-beta6/promql_linux_${TARGETARCH} --output /usr/bin/promql && chmod 755 /usr/bin/promql
#Installing pause cli binaries
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/pause-linux-${TARGETARCH} --output /usr/bin/pause && chmod 755 /usr/bin/pause
#Installing dns_interceptor cli binaries
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/dns_interceptor --output /sbin/dns_interceptor && chmod 755 /sbin/dns_interceptor
#Installing nsutil cli binaries
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/nsutil-linux-${TARGETARCH} --output /sbin/nsutil && chmod 755 /sbin/nsutil
#Installing nsutil shared lib
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/nsutil_${TARGETARCH}.so --output /usr/local/lib/nsutil.so && chmod 755 /usr/local/lib/nsutil.so
# Installing toxiproxy binaries
RUN curl -L https://litmus-http-proxy.s3.amazonaws.com/cli/cli/toxiproxy-cli-linux-${TARGETARCH}.tar.gz --output toxiproxy-cli-linux-${TARGETARCH}.tar.gz && \
tar zxvf toxiproxy-cli-linux-${TARGETARCH}.tar.gz -C /sbin/ && \
chmod 755 /sbin/toxiproxy-cli
RUN curl -L https://litmus-http-proxy.s3.amazonaws.com/server/server/toxiproxy-server-linux-${TARGETARCH}.tar.gz --output toxiproxy-server-linux-${TARGETARCH}.tar.gz && \
tar zxvf toxiproxy-server-linux-${TARGETARCH}.tar.gz -C /sbin/ && \
chmod 755 /sbin/toxiproxy-server
ENV APP_USER=litmus
ENV APP_DIR="/$APP_USER"
ENV DATA_DIR="$APP_DIR/data"
# The USERD_ID of user
ENV APP_USER_ID=2000
RUN useradd -s /bin/true -u $APP_USER_ID -m -d $APP_DIR $APP_USER
# change to 0(root) group because openshift will run container with arbitrary uid as a member of root group
RUN chgrp -R 0 "$APP_DIR" && chmod -R g=u "$APP_DIR"
# Giving sudo to all users (required for almost all experiments)
RUN echo 'ALL ALL=(ALL:ALL) NOPASSWD: ALL' >> /etc/sudoers
WORKDIR $APP_DIR
COPY --from=builder /output/ .
COPY --from=docker:27.0.3 /usr/local/bin/docker /sbin/docker
RUN chmod 755 /sbin/docker
# Set permissions and ownership for the copied binaries
RUN chmod 755 ./experiments ./helpers && \
chown ${APP_USER}:0 ./experiments ./helpers
# Set ownership for binaries in /sbin and /usr/bin
RUN chown ${APP_USER}:0 /sbin/* /usr/bin/* && \
chown root:root /usr/bin/sudo && \
chmod 4755 /usr/bin/sudo
# Copying Necessary Files
COPY ./pkg/cloud/aws/common/ssm-docs/LitmusChaos-AWS-SSM-Docs.yml ./LitmusChaos-AWS-SSM-Docs.yml
RUN chown ${APP_USER}:0 ./LitmusChaos-AWS-SSM-Docs.yml && chmod 755 ./LitmusChaos-AWS-SSM-Docs.yml
USER ${APP_USER}

View File

@ -1,7 +1,6 @@
apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
- role: worker

View File

@ -1,22 +1,28 @@
package lib
import (
"context"
"os"
"strings"
"time"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
//InjectChaosInSerialMode will inject the aws ssm chaos in serial mode that is one after other
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, inject chan os.Signal) error {
// InjectChaosInSerialMode will inject the aws ssm chaos in serial mode that is one after other
func InjectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, inject chan os.Signal) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSSSMFaultInSerialMode")
defer span.End()
select {
case <-inject:
@ -45,7 +51,7 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
ec2IDList := strings.Fields(ec2ID)
commandId, err := ssm.SendSSMCommand(experimentsDetails, ec2IDList)
if err != nil {
return errors.Errorf("fail to send ssm command, err: %v", err)
return stacktrace.Propagate(err, "failed to send ssm command")
}
//prepare commands for abort recovery
experimentsDetails.CommandIDs = append(experimentsDetails.CommandIDs, commandId)
@ -53,21 +59,23 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//wait for the ssm command to get in running state
log.Info("[Wait]: Waiting for the ssm command to get in InProgress state")
if err := ssm.WaitForCommandStatus("InProgress", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return errors.Errorf("fail to start ssm command, err: %v", err)
return stacktrace.Propagate(err, "failed to start ssm command")
}
common.SetTargets(ec2ID, "injected", "EC2", chaosDetails)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//wait for the ssm command to get succeeded in the given chaos duration
log.Info("[Wait]: Waiting for the ssm command to get completed")
if err := ssm.WaitForCommandStatus("Success", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return errors.Errorf("fail to send ssm command, err: %v", err)
return stacktrace.Propagate(err, "failed to send ssm command")
}
common.SetTargets(ec2ID, "reverted", "EC2", chaosDetails)
//Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
@ -82,7 +90,9 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// InjectChaosInParallelMode will inject the aws ssm chaos in parallel mode that is all at once
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, inject chan os.Signal) error {
func InjectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, inject chan os.Signal) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSSSMFaultInParallelMode")
defer span.End()
select {
case <-inject:
@ -107,7 +117,7 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
log.Info("[Chaos]: Starting the ssm command")
commandId, err := ssm.SendSSMCommand(experimentsDetails, instanceIDList)
if err != nil {
return errors.Errorf("fail to send ssm command, err: %v", err)
return stacktrace.Propagate(err, "failed to send ssm command")
}
//prepare commands for abort recovery
experimentsDetails.CommandIDs = append(experimentsDetails.CommandIDs, commandId)
@ -116,14 +126,14 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//wait for the ssm command to get in running state
log.Info("[Wait]: Waiting for the ssm command to get in InProgress state")
if err := ssm.WaitForCommandStatus("InProgress", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return errors.Errorf("fail to start ssm command, err: %v", err)
return stacktrace.Propagate(err, "failed to start ssm command")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -131,7 +141,7 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//wait for the ssm command to get succeeded in the given chaos duration
log.Info("[Wait]: Waiting for the ssm command to get completed")
if err := ssm.WaitForCommandStatus("Success", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return errors.Errorf("fail to send ssm command, err: %v", err)
return stacktrace.Propagate(err, "failed to send ssm command")
}
}
@ -156,14 +166,14 @@ func AbortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, abort c
case len(experimentsDetails.CommandIDs) != 0:
for _, commandId := range experimentsDetails.CommandIDs {
if err := ssm.CancelCommand(commandId, experimentsDetails.Region); err != nil {
log.Errorf("[Abort]: fail to cancle command, recovery failed, err: %v", err)
log.Errorf("[Abort]: Failed to cancel command, recovery failed: %v", err)
}
}
default:
log.Info("[Abort]: No command found to cancle")
log.Info("[Abort]: No SSM Command found to cancel")
}
if err := ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region); err != nil {
log.Errorf("fail to delete ssm doc, err: %v", err)
log.Errorf("Failed to delete ssm document: %v", err)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)

View File

@ -1,6 +1,8 @@
package ssm
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
@ -8,12 +10,15 @@ import (
"github.com/litmuschaos/litmus-go/chaoslib/litmus/aws-ssm-chaos/lib"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
@ -21,8 +26,10 @@ var (
inject, abort chan os.Signal
)
//PrepareAWSSSMChaosByID contains the prepration and injection steps for the experiment
func PrepareAWSSSMChaosByID(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareAWSSSMChaosByID contains the prepration and injection steps for the experiment
func PrepareAWSSSMChaosByID(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSSSMFaultByID")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -42,7 +49,7 @@ func PrepareAWSSSMChaosByID(experimentsDetails *experimentTypes.ExperimentDetail
//create and upload the ssm document on the given aws service monitoring docs
if err = ssm.CreateAndUploadDocument(experimentsDetails.DocumentName, experimentsDetails.DocumentType, experimentsDetails.DocumentFormat, experimentsDetails.DocumentPath, experimentsDetails.Region); err != nil {
return errors.Errorf("fail to create and upload ssm doc, err: %v", err)
return stacktrace.Propagate(err, "could not create and upload the ssm document")
}
experimentsDetails.IsDocsUploaded = true
log.Info("[Info]: SSM docs uploaded successfully")
@ -52,27 +59,27 @@ func PrepareAWSSSMChaosByID(experimentsDetails *experimentTypes.ExperimentDetail
//get the instance id or list of instance ids
instanceIDList := strings.Split(experimentsDetails.EC2InstanceID, ",")
if len(instanceIDList) == 0 {
return errors.Errorf("no instance id found for chaos injection")
if experimentsDetails.EC2InstanceID == "" || len(instanceIDList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no instance id found for chaos injection"}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = lib.InjectChaosInSerialMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return err
if err = lib.InjectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = lib.InjectChaosInParallelMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return err
if err = lib.InjectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Delete the ssm document on the given aws service monitoring docs
err = ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region)
if err != nil {
return errors.Errorf("fail to delete ssm doc, err: %v", err)
return stacktrace.Propagate(err, "failed to delete ssm doc")
}
//Waiting for the ramp time after chaos injection

View File

@ -1,6 +1,8 @@
package ssm
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
@ -8,16 +10,21 @@ import (
"github.com/litmuschaos/litmus-go/chaoslib/litmus/aws-ssm-chaos/lib"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
//PrepareAWSSSMChaosByTag contains the prepration and injection steps for the experiment
func PrepareAWSSSMChaosByTag(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareAWSSSMChaosByTag contains the prepration and injection steps for the experiment
func PrepareAWSSSMChaosByTag(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSSSMFaultByTag")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -37,7 +44,7 @@ func PrepareAWSSSMChaosByTag(experimentsDetails *experimentTypes.ExperimentDetai
//create and upload the ssm document on the given aws service monitoring docs
if err = ssm.CreateAndUploadDocument(experimentsDetails.DocumentName, experimentsDetails.DocumentType, experimentsDetails.DocumentFormat, experimentsDetails.DocumentPath, experimentsDetails.Region); err != nil {
return errors.Errorf("fail to create and upload ssm doc, err: %v", err)
return stacktrace.Propagate(err, "could not create and upload the ssm document")
}
experimentsDetails.IsDocsUploaded = true
log.Info("[Info]: SSM docs uploaded successfully")
@ -48,26 +55,26 @@ func PrepareAWSSSMChaosByTag(experimentsDetails *experimentTypes.ExperimentDetai
log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceIDList))
if len(instanceIDList) == 0 {
return errors.Errorf("no instance id found for chaos injection")
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no instance id found for chaos injection"}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = lib.InjectChaosInSerialMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return err
if err = lib.InjectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = lib.InjectChaosInParallelMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return err
if err = lib.InjectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Delete the ssm document on the given aws service monitoring docs
err = ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region)
if err != nil {
return errors.Errorf("fail to delete ssm doc, err: %v", err)
return stacktrace.Propagate(err, "failed to delete ssm doc")
}
//Waiting for the ramp time after chaos injection

View File

@ -1,6 +1,8 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
@ -9,16 +11,19 @@ import (
"github.com/Azure/azure-sdk-for-go/profiles/latest/compute/mgmt/compute"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/azure/disk-loss/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
diskStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/disk"
instanceStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/instance"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
@ -26,8 +31,10 @@ var (
inject, abort chan os.Signal
)
//PrepareChaos contains the prepration and injection steps for the experiment
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareChaos contains the prepration and injection steps for the experiment
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAzureDiskLossFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -47,13 +54,13 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
//get the disk name or list of disk names
diskNameList := strings.Split(experimentsDetails.VirtualDiskNames, ",")
if len(diskNameList) == 0 {
return errors.Errorf("no volume names found to detach")
if experimentsDetails.VirtualDiskNames == "" || len(diskNameList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no volume names found to detach"}
}
instanceNamesWithDiskNames, err := diskStatus.GetInstanceNameForDisks(diskNameList, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup)
if err != nil {
return errors.Errorf("error fetching attached instances for disks, err: %v", err)
return stacktrace.Propagate(err, "error fetching attached instances for disks")
}
// Get the instance name with attached disks
@ -62,7 +69,7 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
for instanceName := range instanceNamesWithDiskNames {
attachedDisksWithInstance[instanceName], err = diskStatus.GetInstanceDiskList(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, experimentsDetails.ScaleSet, instanceName)
if err != nil {
return errors.Errorf("error fetching virtual disks, err: %v", err)
return stacktrace.Propagate(err, "error fetching virtual disks")
}
}
@ -77,15 +84,15 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, instanceNamesWithDiskNames, attachedDisksWithInstance, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceNamesWithDiskNames, attachedDisksWithInstance, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, instanceNamesWithDiskNames, attachedDisksWithInstance, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceNamesWithDiskNames, attachedDisksWithInstance, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -97,8 +104,10 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
return nil
}
// injectChaosInParallelMode will inject the azure disk loss chaos in parallel mode that is all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesWithDiskNames map[string][]string, attachedDisksWithInstance map[string]*[]compute.DataDisk, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInParallelMode will inject the Azure disk loss chaos in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesWithDiskNames map[string][]string, attachedDisksWithInstance map[string]*[]compute.DataDisk, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureDiskLossFaultInParallelMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
@ -107,7 +116,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on azure virtual disk"
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure virtual disk"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
@ -116,7 +125,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
log.Info("[Chaos]: Detaching the virtual disks from the instances")
for instanceName, diskNameList := range instanceNamesWithDiskNames {
if err = diskStatus.DetachDisks(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameList); err != nil {
return errors.Errorf("failed to detach disks, err: %v", err)
return stacktrace.Propagate(err, "failed to detach disks")
}
}
// Waiting for disk to be detached
@ -124,7 +133,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for _, diskName := range diskNameList {
log.Infof("[Wait]: Waiting for Disk '%v' to detach", diskName)
if err := diskStatus.WaitForDiskToDetach(experimentsDetails, diskName); err != nil {
return errors.Errorf("disk attach check failed, err: %v", err)
return stacktrace.Propagate(err, "disk detachment check failed")
}
}
}
@ -137,8 +146,8 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -150,24 +159,24 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
log.Info("[Chaos]: Attaching the Virtual disks back to the instances")
for instanceName, diskNameList := range attachedDisksWithInstance {
if err = diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameList); err != nil {
return errors.Errorf("virtual disk attachment failed, err: %v", err)
return stacktrace.Propagate(err, "virtual disk attachment failed")
}
}
// Wait for disk to be attached
for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList {
log.Infof("[Wait]: Waiting for Disk '%v' to attach", diskName)
if err := diskStatus.WaitForDiskToAttach(experimentsDetails, diskName); err != nil {
return errors.Errorf("disk attach check failed, err: %v", err)
// Wait for disk to be attached
for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList {
log.Infof("[Wait]: Waiting for Disk '%v' to attach", diskName)
if err := diskStatus.WaitForDiskToAttach(experimentsDetails, diskName); err != nil {
return stacktrace.Propagate(err, "disk attachment check failed")
}
}
}
}
// Updating the result details
for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList {
common.SetTargets(diskName, "re-attached", "VirtualDisk", chaosDetails)
// Updating the result details
for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList {
common.SetTargets(diskName, "re-attached", "VirtualDisk", chaosDetails)
}
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
@ -175,8 +184,10 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
return nil
}
//injectChaosInSerialMode will inject the azure disk loss chaos in serial mode that is one after other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesWithDiskNames map[string][]string, attachedDisksWithInstance map[string]*[]compute.DataDisk, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInSerialMode will inject the Azure disk loss chaos in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesWithDiskNames map[string][]string, attachedDisksWithInstance map[string]*[]compute.DataDisk, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureDiskLossFaultInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
@ -185,7 +196,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on azure virtual disks"
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure virtual disks"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
@ -198,13 +209,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
// Detaching the virtual disks
log.Infof("[Chaos]: Detaching %v from the instance", diskName)
if err = diskStatus.DetachDisks(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameToList); err != nil {
return errors.Errorf("failed to detach disks, err: %v", err)
return stacktrace.Propagate(err, "failed to detach disks")
}
// Waiting for disk to be detached
log.Infof("[Wait]: Waiting for Disk '%v' to detach", diskName)
if err := diskStatus.WaitForDiskToDetach(experimentsDetails, diskName); err != nil {
return errors.Errorf("disk detach check failed, err: %v", err)
return stacktrace.Propagate(err, "disk detachment check failed")
}
common.SetTargets(diskName, "detached", "VirtualDisk", chaosDetails)
@ -212,8 +223,8 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -224,13 +235,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Attaching the virtual disks to the instance
log.Infof("[Chaos]: Attaching %v back to the instance", diskName)
if err = diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, attachedDisksWithInstance[instanceName]); err != nil {
return errors.Errorf("disk attachment failed, err: %v", err)
return stacktrace.Propagate(err, "disk attachment failed")
}
// Waiting for disk to be attached
log.Infof("[Wait]: Waiting for Disk '%v' to attach", diskName)
if err := diskStatus.WaitForDiskToAttach(experimentsDetails, diskName); err != nil {
return errors.Errorf("disk attach check failed, err: %v", err)
return stacktrace.Propagate(err, "disk attachment check failed")
}
common.SetTargets(diskName, "re-attached", "VirtualDisk", chaosDetails)
@ -257,10 +268,10 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, attache
Try(func(attempt uint) error {
status, err := instanceStatus.GetAzureInstanceProvisionStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet)
if err != nil {
return errors.Errorf("Failed to get instance, err: %v", err)
return stacktrace.Propagate(err, "failed to get instance")
}
if status != "Provisioning succeeded" {
return errors.Errorf("instance is updating, waiting for instance to finish update")
return stacktrace.Propagate(err, "instance is updating, waiting for instance to finish update")
}
return nil
})
@ -271,11 +282,11 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, attache
for _, disk := range *diskList {
diskStatusString, err := diskStatus.GetDiskStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, *disk.Name)
if err != nil {
log.Errorf("Failed to get disk status, err: %v", err)
log.Errorf("Failed to get disk status: %v", err)
}
if diskStatusString != "Attached" {
if err := diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskList); err != nil {
log.Errorf("failed to attach disk '%v, manual revert required, err: %v", err)
log.Errorf("Failed to attach disk, manual revert required: %v", err)
} else {
common.SetTargets(*disk.Name, "re-attached", "VirtualDisk", chaosDetails)
}

View File

@ -1,6 +1,8 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
@ -8,15 +10,18 @@ import (
"time"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/azure/instance-stop/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
azureCommon "github.com/litmuschaos/litmus-go/pkg/cloud/azure/common"
azureStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/instance"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
@ -25,7 +30,9 @@ var (
)
// PrepareAzureStop will initialize instanceNameList and start chaos injection based on sequence method selected
func PrepareAzureStop(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareAzureStop(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAzureInstanceStopFault")
defer span.End()
// inject channel is used to transmit signal notifications
inject = make(chan os.Signal, 1)
@ -43,9 +50,9 @@ func PrepareAzureStop(experimentsDetails *experimentTypes.ExperimentDetails, cli
}
// get the instance name or list of instance names
instanceNameList := strings.Split(experimentsDetails.AzureInstanceName, ",")
if len(instanceNameList) == 0 {
return errors.Errorf("no instance name found to stop")
instanceNameList := strings.Split(experimentsDetails.AzureInstanceNames, ",")
if experimentsDetails.AzureInstanceNames == "" || len(instanceNameList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no instance name found to stop"}
}
// watching for the abort signal and revert the chaos
@ -53,15 +60,15 @@ func PrepareAzureStop(experimentsDetails *experimentTypes.ExperimentDetails, cli
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, instanceNameList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceNameList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, instanceNameList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceNameList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// Waiting for the ramp time after chaos injection
@ -72,8 +79,11 @@ func PrepareAzureStop(experimentsDetails *experimentTypes.ExperimentDetails, cli
return nil
}
// injectChaosInSerialMode will inject the azure instance termination in serial mode that is one after the other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInSerialMode will inject the Azure instance termination in serial mode that is one after the other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureInstanceStopFaultInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
@ -88,7 +98,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Infof("[Info]: Target instanceName list, %v", instanceNameList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on azure instance"
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
@ -100,25 +110,25 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Infof("[Chaos]: Stopping the Azure instance: %v", vmName)
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to stop the Azure instance, err: %v", err)
return stacktrace.Propagate(err, "unable to stop the Azure instance")
}
} else {
if err := azureStatus.AzureInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to stop the Azure instance, err: %v", err)
return stacktrace.Propagate(err, "unable to stop the Azure instance")
}
}
// Wait for Azure instance to completely stop
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the stopped state", vmName)
if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("instance poweroff status check failed, err: %v", err)
return stacktrace.Propagate(err, "instance poweroff status check failed")
}
// Run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -130,18 +140,18 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Info("[Chaos]: Starting back the Azure instance")
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to start the Azure instance, err: %v", err)
return stacktrace.Propagate(err, "unable to start the Azure instance")
}
} else {
if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to start the Azure instance, err: %v", err)
return stacktrace.Propagate(err, "unable to start the Azure instance")
}
}
// Wait for Azure instance to get in running state
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the running state", vmName)
if err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("instance power on status check failed, err: %v", err)
return stacktrace.Propagate(err, "instance power on status check failed")
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
@ -150,8 +160,11 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
return nil
}
// injectChaosInParallelMode will inject the azure instance termination in parallel mode that is all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInParallelMode will inject the Azure instance termination in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureInstanceStopFaultInParallelMode")
defer span.End()
select {
case <-inject:
// Stopping the chaos execution, if abort signal received
@ -177,11 +190,11 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
log.Infof("[Chaos]: Stopping the Azure instance: %v", vmName)
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to stop azure instance, err: %v", err)
return stacktrace.Propagate(err, "unable to stop Azure instance")
}
} else {
if err := azureStatus.AzureInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to stop azure instance, err: %v", err)
return stacktrace.Propagate(err, "unable to stop Azure instance")
}
}
}
@ -190,14 +203,14 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for _, vmName := range instanceNameList {
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the stopped state", vmName)
if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("instance poweroff status check failed, err: %v", err)
return stacktrace.Propagate(err, "instance poweroff status check failed")
}
}
// Run probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -210,11 +223,11 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
log.Infof("[Chaos]: Starting back the Azure instance: %v", vmName)
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to start the Azure instance, err: %v", err)
return stacktrace.Propagate(err, "unable to start the Azure instance")
}
} else {
if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to start the Azure instance, err: %v", err)
return stacktrace.Propagate(err, "unable to start the Azure instance")
}
}
}
@ -223,7 +236,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for _, vmName := range instanceNameList {
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the running state", vmName)
if err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("instance power on status check failed, err: %v", err)
return stacktrace.Propagate(err, "instance power on status check failed")
}
}
@ -248,22 +261,22 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanc
instanceState, err = azureStatus.GetAzureInstanceStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName)
}
if err != nil {
log.Errorf("[Abort]: Fail to get instance status when an abort signal is received, err: %v", err)
log.Errorf("[Abort]: Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "VM running" && instanceState != "VM starting" {
log.Info("[Abort]: Waiting for the Azure instance to get down")
if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
log.Errorf("[Abort]: Instance power off status check failed, err: %v", err)
log.Errorf("[Abort]: Instance power off status check failed: %v", err)
}
log.Info("[Abort]: Starting Azure instance as abort signal received")
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
log.Errorf("[Abort]: Unable to start the Azure instance, err: %v", err)
log.Errorf("[Abort]: Unable to start the Azure instance: %v", err)
}
} else {
if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
log.Errorf("[Abort]: Unable to start the Azure instance, err: %v", err)
log.Errorf("[Abort]: Unable to start the Azure instance: %v", err)
}
}
}
@ -271,7 +284,7 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanc
log.Info("[Abort]: Waiting for the Azure instance to start")
err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName)
if err != nil {
log.Errorf("[Abort]: Instance power on status check failed, err: %v", err)
log.Errorf("[Abort]: Instance power on status check failed: %v", err)
log.Errorf("[Abort]: Azure instance %v failed to start after an abort signal is received", vmName)
}
}

View File

@ -1,27 +1,38 @@
package helper
import (
"bytes"
"context"
"fmt"
"os/exec"
"strconv"
"strings"
"time"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/palantir/stacktrace"
"github.com/sirupsen/logrus"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/openebs/maya/pkg/util/retry"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientTypes "k8s.io/apimachinery/pkg/types"
)
var err error
// Helper injects the container-kill chaos
func Helper(clients clients.ClientSets) {
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulateContainerKillFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
@ -32,14 +43,18 @@ func Helper(clients clients.ClientSets) {
log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails)
// Intialise the chaos attributes
// Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters
// Initialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
err := killContainer(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails)
if err != nil {
if err := killContainer(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
@ -48,6 +63,33 @@ func Helper(clients clients.ClientSets) {
// it will kill the container till the chaos duration
// the execution will stop after timestamp passes the given chaos duration
func killContainer(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not parse targets")
}
var targets []targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
targets = append(targets, td)
log.Infof("Injecting chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
}
if err := killIterations(targets, experimentsDetails, clients, eventsDetails, chaosDetails, resultDetails); err != nil {
return err
}
log.Infof("[Completion]: %v chaos has been completed", experimentsDetails.ExperimentName)
return nil
}
func killIterations(targets []targetDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
@ -55,43 +97,30 @@ func killContainer(experimentsDetails *experimentTypes.ExperimentDetails, client
for duration < experimentsDetails.ChaosDuration {
//getRestartCount return the restart count of target container
restartCountBefore, err := getRestartCount(experimentsDetails, experimentsDetails.TargetPods, clients)
if err != nil {
return err
}
var containerIds []string
//Obtain the container ID through Pod
// this id will be used to select the container for the kill
containerID, err := common.GetContainerID(experimentsDetails.AppNS, experimentsDetails.TargetPods, experimentsDetails.TargetContainer, clients)
if err != nil {
return errors.Errorf("Unable to get the container id, %v", err)
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": experimentsDetails.TargetPods,
"ContainerName": experimentsDetails.TargetContainer,
"RestartCountBefore": restartCountBefore,
})
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
switch experimentsDetails.ContainerRuntime {
case "docker":
if err := stopDockerContainer(containerID, experimentsDetails.SocketPath, experimentsDetails.Signal); err != nil {
return err
for _, t := range targets {
t.RestartCountBefore, err = getRestartCount(t, clients)
if err != nil {
return stacktrace.Propagate(err, "could get container restart count")
}
case "containerd", "crio":
if err := stopContainerdContainer(containerID, experimentsDetails.SocketPath, experimentsDetails.Signal); err != nil {
return err
containerId, err := common.GetContainerID(t.Namespace, t.Name, t.TargetContainer, clients, t.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
default:
return errors.Errorf("%v container runtime not supported", experimentsDetails.ContainerRuntime)
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": t.Name,
"ContainerName": t.TargetContainer,
"RestartCountBefore": t.RestartCountBefore,
})
containerIds = append(containerIds, containerId)
}
if err := kill(experimentsDetails, containerIds, clients, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not kill target container")
}
//Waiting for the chaos interval after chaos injection
@ -100,67 +129,93 @@ func killContainer(experimentsDetails *experimentTypes.ExperimentDetails, client
common.WaitForDuration(experimentsDetails.ChaosInterval)
}
//Check the status of restarted container
err = common.CheckContainerStatus(experimentsDetails.AppNS, experimentsDetails.TargetPods, clients)
if err != nil {
return errors.Errorf("application container is not in running state, %v", err)
for _, t := range targets {
if err := validate(t, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not verify restart count")
}
if err := result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "targeted", "pod", t.Name); err != nil {
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
// It will verify that the restart count of container should increase after chaos injection
err = verifyRestartCount(experimentsDetails, experimentsDetails.TargetPods, clients, restartCountBefore)
if err != nil {
return err
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
if err := result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "targeted", "pod", experimentsDetails.TargetPods); err != nil {
return nil
}
func kill(experimentsDetails *experimentTypes.ExperimentDetails, containerIds []string, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
switch experimentsDetails.ContainerRuntime {
case "docker":
if err := stopDockerContainer(containerIds, experimentsDetails.SocketPath, experimentsDetails.Signal, experimentsDetails.ChaosPodName); err != nil {
if isContextDeadlineExceeded(err) {
return nil
}
return stacktrace.Propagate(err, "could not stop container")
}
case "containerd", "crio":
if err := stopContainerdContainer(containerIds, experimentsDetails.SocketPath, experimentsDetails.Signal, experimentsDetails.ChaosPodName, experimentsDetails.Timeout); err != nil {
if isContextDeadlineExceeded(err) {
return nil
}
return stacktrace.Propagate(err, "could not stop container")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: fmt.Sprintf("unsupported container runtime %s", experimentsDetails.ContainerRuntime)}
}
return nil
}
func validate(t targetDetails, timeout, delay int, clients clients.ClientSets) error {
//Check the status of restarted container
if err := common.CheckContainerStatus(t.Namespace, t.Name, timeout, delay, clients, t.Source); err != nil {
return err
}
log.Infof("[Completion]: %v chaos has been completed", experimentsDetails.ExperimentName)
return nil
// It will verify that the restart count of container should increase after chaos injection
return verifyRestartCount(t, timeout, delay, clients, t.RestartCountBefore)
}
//stopContainerdContainer kill the application container
func stopContainerdContainer(containerID, socketPath, signal string) error {
var errOut bytes.Buffer
var cmd *exec.Cmd
endpoint := "unix://" + socketPath
switch signal {
case "SIGKILL":
cmd = exec.Command("sudo", "crictl", "-i", endpoint, "-r", endpoint, "stop", "--timeout=0", string(containerID))
case "SIGTERM":
cmd = exec.Command("sudo", "crictl", "-i", endpoint, "-r", endpoint, "stop", string(containerID))
default:
return errors.Errorf("{%v} signal not supported, use either SIGTERM or SIGKILL", signal)
// stopContainerdContainer kill the application container
func stopContainerdContainer(containerIDs []string, socketPath, signal, source string, timeout int) error {
if signal != "SIGKILL" && signal != "SIGTERM" {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: source, Reason: fmt.Sprintf("unsupported signal %s, use either SIGTERM or SIGKILL", signal)}
}
cmd.Stderr = &errOut
if err := cmd.Run(); err != nil {
return errors.Errorf("Unable to run command, err: %v; error output: %v", err, errOut.String())
cmd := exec.Command("sudo", "crictl", "-i", fmt.Sprintf("unix://%s", socketPath), "-r", fmt.Sprintf("unix://%s", socketPath), "stop")
if signal == "SIGKILL" {
cmd.Args = append(cmd.Args, "--timeout=0")
} else if timeout != -1 {
cmd.Args = append(cmd.Args, fmt.Sprintf("--timeout=%v", timeout))
}
return nil
cmd.Args = append(cmd.Args, containerIDs...)
return common.RunCLICommands(cmd, source, "", "failed to stop container", cerrors.ErrorTypeChaosInject)
}
//stopDockerContainer kill the application container
func stopDockerContainer(containerID, socketPath, signal string) error {
var errOut bytes.Buffer
host := "unix://" + socketPath
cmd := exec.Command("sudo", "docker", "--host", host, "kill", string(containerID), "--signal", signal)
cmd.Stderr = &errOut
if err := cmd.Run(); err != nil {
return errors.Errorf("Unable to run command, err: %v; error output: %v", err, errOut.String())
}
return nil
// stopDockerContainer kill the application container
func stopDockerContainer(containerIDs []string, socketPath, signal, source string) error {
cmd := exec.Command("sudo", "docker", "--host", fmt.Sprintf("unix://%s", socketPath), "kill", "--signal", signal)
cmd.Args = append(cmd.Args, containerIDs...)
return common.RunCLICommands(cmd, source, "", "failed to stop container", cerrors.ErrorTypeChaosInject)
}
//getRestartCount return the restart count of target container
func getRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets) (int, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(podName, v1.GetOptions{})
// getRestartCount return the restart count of target container
func getRestartCount(target targetDetails, clients clients.ClientSets) (int, error) {
pod, err := clients.GetPod(target.Namespace, target.Name, 180, 2)
if err != nil {
return 0, err
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: target.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", target.Name, target.Namespace), Reason: err.Error()}
}
restartCount := 0
for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentsDetails.TargetContainer {
if container.Name == target.TargetContainer {
restartCount = int(container.RestartCount)
break
}
@ -168,39 +223,36 @@ func getRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, podN
return restartCount, nil
}
//verifyRestartCount verify the restart count of target container that it is restarted or not after chaos injection
func verifyRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets, restartCountBefore int) error {
// verifyRestartCount verify the restart count of target container that it is restarted or not after chaos injection
func verifyRestartCount(t targetDetails, timeout, delay int, clients clients.ClientSets, restartCountBefore int) error {
restartCountAfter := 0
return retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Times(uint(timeout / delay)).
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(podName, v1.GetOptions{})
pod, err := clients.KubeClient.CoreV1().Pods(t.Namespace).Get(context.Background(), t.Name, v1.GetOptions{})
if err != nil {
return errors.Errorf("Unable to find the pod with name %v, err: %v", podName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
}
for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentsDetails.TargetContainer {
if container.Name == t.TargetContainer {
restartCountAfter = int(container.RestartCount)
break
}
}
if restartCountAfter <= restartCountBefore {
return errors.Errorf("Target container is not restarted")
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: "target container is not restarted after kill"}
}
log.Infof("restartCount of target container after chaos injection: %v", strconv.Itoa(restartCountAfter))
return nil
})
}
//getENV fetches all the env variables from the runner pod
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
@ -212,4 +264,17 @@ func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.Signal = types.Getenv("SIGNAL", "SIGKILL")
experimentDetails.Delay, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_DELAY", "2"))
experimentDetails.Timeout, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_TIMEOUT", "180"))
experimentDetails.ContainerAPITimeout, _ = strconv.Atoi(types.Getenv("CONTAINER_API_TIMEOUT", "-1"))
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
RestartCountBefore int
Source string
}
func isContextDeadlineExceeded(err error) bool {
return strings.Contains(err.Error(), "context deadline exceeded")
}

View File

@ -1,40 +1,52 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareContainerKill contains the prepration steps before chaos injection
func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareContainerKill contains the preparation steps before chaos injection
func PrepareContainerKill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareContainerKillFault")
defer span.End()
var err error
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
//Set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
log.InfoWithValues("[Info]: The tunables are:", logrus.Fields{
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -46,35 +58,28 @@ func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails,
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err)
}
}
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, clients); err != nil {
return err
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -86,13 +91,12 @@ func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails,
}
// injectChaosInSerialMode kill the container of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectContainerKillFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -100,98 +104,64 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
runID := stringutils.GetRunID()
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pods")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectContainerKillFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pods")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, podName, nodeName, runID, labelSuffix string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateContainerKillFaultHelperPod")
defer span.End()
privilegedEnable := false
if experimentsDetails.ContainerRuntime == "crio" {
@ -201,10 +171,10 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
@ -235,7 +205,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name container-kill",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, podName),
Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
@ -250,17 +220,23 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName string) []apiv1.EnvVar {
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS).
SetEnv("APP_POD", podName).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
@ -272,8 +248,18 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName st
SetEnv("STATUS_CHECK_DELAY", strconv.Itoa(experimentsDetails.Delay)).
SetEnv("STATUS_CHECK_TIMEOUT", strconv.Itoa(experimentsDetails.Timeout)).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("CONTAINER_API_TIMEOUT", strconv.Itoa(experimentsDetails.ContainerAPITimeout)).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
}
// SetChaosTunables will setup a random value within a given range of values
// If the value is not provided in range it'll setup the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -1,6 +1,7 @@
package helper
import (
"context"
"fmt"
"os"
"os/exec"
@ -10,6 +11,11 @@ import (
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/disk-fill/types"
@ -17,7 +23,6 @@ import (
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
"k8s.io/apimachinery/pkg/api/resource"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
@ -27,7 +32,9 @@ import (
var inject, abort chan os.Signal
// Helper injects the disk-fill chaos
func Helper(clients clients.ClientSets) {
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulateDiskFillFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
@ -50,6 +57,7 @@ func Helper(clients clients.ClientSets) {
// Intialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
@ -58,57 +66,58 @@ func Helper(clients clients.ClientSets) {
result.SetResultUID(&resultDetails, clients, &chaosDetails)
if err := diskFill(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
//diskFill contains steps to inject disk-fill chaos
// diskFill contains steps to inject disk-fill chaos
func diskFill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
// Derive the container id of the target container
containerID, err := common.GetContainerID(experimentsDetails.AppNS, experimentsDetails.TargetPods, experimentsDetails.TargetContainer, clients)
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return err
return stacktrace.Propagate(err, "could not parse targets")
}
// derive the used ephemeral storage size from the target container
du := fmt.Sprintf("sudo du /diskfill/%v", containerID)
cmd := exec.Command("/bin/bash", "-c", du)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(string(out))
return err
var targets []targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
// Derive the container id of the target container
td.ContainerId, err = common.GetContainerID(td.Namespace, td.Name, td.TargetContainer, clients, chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the pid of the target container
td.TargetPID, err = common.GetPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return err
}
td.SizeToFill, err = getDiskSizeToFill(td, experimentsDetails, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get disk size to fill")
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": td.Name,
"Namespace": td.Namespace,
"SizeToFill(KB)": td.SizeToFill,
"TargetContainer": td.TargetContainer,
})
targets = append(targets, td)
}
ephemeralStorageDetails := string(out)
// filtering out the used ephemeral storage from the output of du command
usedEphemeralStorageSize, err := filterUsedEphemeralStorage(ephemeralStorageDetails)
if err != nil {
return errors.Errorf("unable to filter used ephemeral storage size, err: %v", err)
}
log.Infof("used ephemeral storage space: %vKB", strconv.Itoa(usedEphemeralStorageSize))
// GetEphemeralStorageAttributes derive the ephemeral storage attributes from the target container
ephemeralStorageLimit, err := getEphemeralStorageAttributes(experimentsDetails, clients)
if err != nil {
return err
}
if ephemeralStorageLimit == 0 && experimentsDetails.EphemeralStorageMebibytes == 0 {
return errors.Errorf("either provide ephemeral storage limit inside target container or define EPHEMERAL_STORAGE_MEBIBYTES ENV")
}
// deriving the ephemeral storage size to be filled
sizeTobeFilled := getSizeToBeFilled(experimentsDetails, usedEphemeralStorageSize, int(ephemeralStorageLimit))
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": experimentsDetails.TargetPods,
"ContainerName": experimentsDetails.TargetContainer,
"ephemeralStorageLimit(KB)": ephemeralStorageLimit,
"ContainerID": containerID,
})
log.Infof("ephemeral storage size to be filled: %vKB", strconv.Itoa(sizeTobeFilled))
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
@ -118,65 +127,80 @@ func diskFill(experimentsDetails *experimentTypes.ExperimentDetails, clients cli
}
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, clients, containerID, resultDetails.Name)
if sizeTobeFilled > 0 {
if err := fillDisk(containerID, sizeTobeFilled, experimentsDetails.DataBlockSize); err != nil {
log.Error(string(out))
return err
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Stopping the experiment")
// It will delete the target pod if target pod is evicted
// if target pod is still running then it will delete all the files, which was created earlier during chaos execution
err = remedy(experimentsDetails, clients, containerID)
if err != nil {
return errors.Errorf("unable to perform remedy operation, err: %v", err)
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
} else {
log.Warn("No required free space found!, It's Housefull")
}
return nil
}
// fillDisk fill the ephemeral disk by creating files
func fillDisk(containerID string, sizeTobeFilled, bs int) error {
go abortWatcher(targets, experimentsDetails, clients, resultDetails.Name)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
// Creating files to fill the required ephemeral storage size of block size of 4K
log.Infof("[Fill]: Filling ephemeral storage, size: %vKB", sizeTobeFilled)
dd := fmt.Sprintf("sudo dd if=/dev/urandom of=/diskfill/%v/diskfill bs=%vK count=%v", containerID, bs, strconv.Itoa(sizeTobeFilled/bs))
log.Infof("dd: {%v}", dd)
cmd := exec.Command("/bin/bash", "-c", dd)
_, err := cmd.CombinedOutput()
return err
}
for _, t := range targets {
if t.SizeToFill > 0 {
if err := fillDisk(t, experimentsDetails.DataBlockSize); err != nil {
return stacktrace.Propagate(err, "could not fill ephemeral storage")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertDiskFill(t, clients); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
} else {
log.Warn("No required free space found!")
}
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Stopping the experiment")
var errList []string
for _, t := range targets {
// It will delete the target pod if target pod is evicted
// if target pod is still running then it will delete all the files, which was created earlier during chaos execution
if err = revertDiskFill(t, clients); err != nil {
errList = append(errList, err.Error())
continue
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// fillDisk fill the ephemeral disk by creating files
func fillDisk(t targetDetails, bs int) error {
// Creating files to fill the required ephemeral storage size of block size of 4K
log.Infof("[Fill]: Filling ephemeral storage, size: %vKB", t.SizeToFill)
dd := fmt.Sprintf("sudo dd if=/dev/urandom of=/proc/%v/root/home/diskfill bs=%vK count=%v", t.TargetPID, bs, strconv.Itoa(t.SizeToFill/bs))
log.Infof("dd: {%v}", dd)
cmd := exec.Command("/bin/bash", "-c", dd)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(err.Error())
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: string(out)}
}
return nil
}
// getEphemeralStorageAttributes derive the ephemeral storage attributes from the target pod
func getEphemeralStorageAttributes(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (int64, error) {
func getEphemeralStorageAttributes(t targetDetails, clients clients.ClientSets) (int64, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(experimentsDetails.TargetPods, v1.GetOptions{})
pod, err := clients.GetPod(t.Namespace, t.Name, 180, 2)
if err != nil {
return 0, err
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
}
var ephemeralStorageLimit int64
@ -185,7 +209,7 @@ func getEphemeralStorageAttributes(experimentsDetails *experimentTypes.Experimen
// Extracting ephemeral storage limit & requested value from the target container
// It will be in the form of Kb
for _, container := range containers {
if container.Name == experimentsDetails.TargetContainer {
if container.Name == t.TargetContainer {
ephemeralStorageLimit = container.Resources.Limits.StorageEphemeral().ToDec().ScaledValue(resource.Kilo)
break
}
@ -202,7 +226,7 @@ func filterUsedEphemeralStorage(ephemeralStorageDetails string) (int, error) {
ephemeralStorageAll := strings.Split(ephemeralStorageDetails, "\n")
// It will return the details of main directory
ephemeralStorageAllDiskFill := strings.Split(ephemeralStorageAll[len(ephemeralStorageAll)-2], "\t")[0]
// type casting string to interger
// type casting string to integer
ephemeralStorageSize, err := strconv.Atoi(ephemeralStorageAllDiskFill)
return ephemeralStorageSize, err
}
@ -213,62 +237,64 @@ func getSizeToBeFilled(experimentsDetails *experimentTypes.ExperimentDetails, us
switch ephemeralStorageLimit {
case 0:
requirementToBeFill = experimentsDetails.EphemeralStorageMebibytes * 1024
ephemeralStorageMebibytes, _ := strconv.Atoi(experimentsDetails.EphemeralStorageMebibytes)
requirementToBeFill = ephemeralStorageMebibytes * 1024
default:
// deriving size need to be filled from the used size & requirement size to fill
requirementToBeFill = (ephemeralStorageLimit * experimentsDetails.FillPercentage) / 100
fillPercentage, _ := strconv.Atoi(experimentsDetails.FillPercentage)
requirementToBeFill = (ephemeralStorageLimit * fillPercentage) / 100
}
needToBeFilled := requirementToBeFill - usedEphemeralStorageSize
return needToBeFilled
}
// remedy will delete the target pod if target pod is evicted
// revertDiskFill will delete the target pod if target pod is evicted
// if target pod is still running then it will delete the files, which was created during chaos execution
func remedy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, containerID string) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(experimentsDetails.TargetPods, v1.GetOptions{})
func revertDiskFill(t targetDetails, clients clients.ClientSets) error {
pod, err := clients.GetPod(t.Namespace, t.Name, 180, 2)
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s,namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
}
// Deleting the pod as pod is already evicted
podReason := pod.Status.Reason
if podReason == "Evicted" {
// Deleting the pod as pod is already evicted
log.Warn("Target pod is evicted, deleting the pod")
if err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(experimentsDetails.TargetPods, &v1.DeleteOptions{}); err != nil {
return err
if err := clients.KubeClient.CoreV1().Pods(t.Namespace).Delete(context.Background(), t.Name, v1.DeleteOptions{}); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s,namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("failed to delete target pod after eviction :%s", err.Error())}
}
} else {
// deleting the files after chaos execution
rm := fmt.Sprintf("sudo rm -rf /diskfill/%v/diskfill", containerID)
rm := fmt.Sprintf("sudo rm -rf /proc/%v/root/home/diskfill", t.TargetPID)
cmd := exec.Command("/bin/bash", "-c", rm)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(string(out))
return err
log.Error(err.Error())
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s,namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("failed to cleanup ephemeral storage: %s", string(out))}
}
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
return nil
}
//getENV fetches all the env variables from the runner pod
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.FillPercentage, _ = strconv.Atoi(types.Getenv("FILL_PERCENTAGE", ""))
experimentDetails.EphemeralStorageMebibytes, _ = strconv.Atoi(types.Getenv("EPHEMERAL_STORAGE_MEBIBYTES", ""))
experimentDetails.FillPercentage = types.Getenv("FILL_PERCENTAGE", "")
experimentDetails.EphemeralStorageMebibytes = types.Getenv("EPHEMERAL_STORAGE_MEBIBYTES", "")
experimentDetails.DataBlockSize, _ = strconv.Atoi(types.Getenv("DATA_BLOCK_SIZE", "256"))
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, containerID, resultName string) {
func abortWatcher(targets []targetDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultName string) {
// waiting till the abort signal received
<-abort
@ -277,15 +303,72 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
if err := remedy(experimentsDetails, clients, containerID); err != nil {
log.Errorf("unable to perform remedy operation, err: %v", err)
for _, t := range targets {
err := revertDiskFill(t, clients)
if err != nil {
log.Errorf("unable to kill disk-fill process, err :%v", err)
continue
}
if err = result.AnnotateChaosResult(resultName, experimentsDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
}
retry--
time.Sleep(1 * time.Second)
}
if err := result.AnnotateChaosResult(resultName, experimentsDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
log.Info("Chaos Revert Completed")
os.Exit(1)
}
func getDiskSizeToFill(t targetDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (int, error) {
usedEphemeralStorageSize, err := getUsedEphemeralStorage(t)
if err != nil {
return 0, stacktrace.Propagate(err, "could not get used ephemeral storage")
}
// GetEphemeralStorageAttributes derive the ephemeral storage attributes from the target container
ephemeralStorageLimit, err := getEphemeralStorageAttributes(t, clients)
if err != nil {
return 0, stacktrace.Propagate(err, "could not get ephemeral storage attributes")
}
if ephemeralStorageLimit == 0 && experimentsDetails.EphemeralStorageMebibytes == "0" {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: "either provide ephemeral storage limit inside target container or define EPHEMERAL_STORAGE_MEBIBYTES ENV"}
}
// deriving the ephemeral storage size to be filled
sizeTobeFilled := getSizeToBeFilled(experimentsDetails, usedEphemeralStorageSize, int(ephemeralStorageLimit))
return sizeTobeFilled, nil
}
func getUsedEphemeralStorage(t targetDetails) (int, error) {
// derive the used ephemeral storage size from the target container
du := fmt.Sprintf("sudo du /proc/%v/root", t.TargetPID)
cmd := exec.Command("/bin/bash", "-c", du)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(err.Error())
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("failed to get used ephemeral storage size: %s", string(out))}
}
ephemeralStorageDetails := string(out)
// filtering out the used ephemeral storage from the output of du command
usedEphemeralStorageSize, err := filterUsedEphemeralStorage(ephemeralStorageDetails)
if err != nil {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("failed to get used ephemeral storage size: %s", err.Error())}
}
log.Infof("used ephemeral storage space: %vKB", strconv.Itoa(usedEphemeralStorageSize))
return usedEphemeralStorageSize, nil
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
ContainerId string
SizeToFill int
TargetPID int
Source string
}

View File

@ -1,43 +1,57 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/disk-fill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareDiskFill contains the prepration steps before chaos injection
func PrepareDiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareDiskFill contains the preparation steps before chaos injection
func PrepareDiskFill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareDiskFillFault")
defer span.End()
// It will contains all the pod & container details required for exec command
var err error
// It will contain all the pod & container details required for exec command
execCommandDetails := exec.PodDetails{}
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
//set up the tunables if provided in range
setChaosTunables(experimentsDetails)
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"FillPercentage": experimentsDetails.FillPercentage,
"EphemeralStorageMebibytes": experimentsDetails.EphemeralStorageMebibytes,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -45,39 +59,32 @@ func PrepareDiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clie
common.WaitForDuration(experimentsDetails.RampTime)
}
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
}
// Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err)
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, clients); err != nil {
return err
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -89,46 +96,33 @@ func PrepareDiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clie
}
// injectChaosInSerialMode fill the ephemeral storage of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectDiskFillFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform disk-fill chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//Deleting all the helper pod for disk-fill chaos
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
@ -137,77 +131,69 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode fill the ephemeral storage of of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectDiskFillFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform disk-fill chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for disk-fill chaos
log.Info("[Cleanup]: Deleting all the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, appNodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateDiskFillFaultHelperPod")
defer span.End()
mountPropagationMode := apiv1.MountPropagationHostToContainer
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
Volumes: []apiv1.Volume{
{
Name: "udev",
Name: "socket-path",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.ContainerPath,
Path: experimentsDetails.SocketPath,
},
},
},
@ -225,40 +211,62 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name disk-fill",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, appName),
Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "udev",
MountPath: "/diskfill",
MountPropagation: &mountPropagationMode,
Name: "socket-path",
MountPath: experimentsDetails.SocketPath,
},
},
SecurityContext: &apiv1.SecurityContext{
Privileged: &privilegedEnable,
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName string) []apiv1.EnvVar {
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS).
SetEnv("APP_POD", podName).
envDetails.SetEnv("TARGETS", targets).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
SetEnv("CHAOS_UID", string(experimentsDetails.ChaosUID)).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("FILL_PERCENTAGE", strconv.Itoa(experimentsDetails.FillPercentage)).
SetEnv("EPHEMERAL_STORAGE_MEBIBYTES", strconv.Itoa(experimentsDetails.EphemeralStorageMebibytes)).
SetEnv("FILL_PERCENTAGE", experimentsDetails.FillPercentage).
SetEnv("EPHEMERAL_STORAGE_MEBIBYTES", experimentsDetails.EphemeralStorageMebibytes).
SetEnv("DATA_BLOCK_SIZE", strconv.Itoa(experimentsDetails.DataBlockSize)).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
}
// setChaosTunables will setup a random value within a given range of values
// If the value is not provided in range it'll setup the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.FillPercentage = common.ValidateRange(experimentsDetails.FillPercentage)
experimentsDetails.EphemeralStorageMebibytes = common.ValidateRange(experimentsDetails.EphemeralStorageMebibytes)
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -1,31 +1,38 @@
package lib
import (
"context"
"fmt"
"strconv"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/docker-service-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareDockerServiceKill contains prepration steps before chaos injection
func PrepareDockerServiceKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareDockerServiceKill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareDockerServiceKillFault")
defer span.End()
var err error
if experimentsDetails.TargetNode == "" {
//Select node for docker-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node name")
}
}
@ -33,7 +40,7 @@ func PrepareDockerServiceKill(experimentsDetails *experimentTypes.ExperimentDeta
"NodeName": experimentsDetails.TargetNode,
})
experimentsDetails.RunID = common.GetRunID()
experimentsDetails.RunID = stringutils.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -48,53 +55,20 @@ func PrepareDockerServiceKill(experimentsDetails *experimentTypes.ExperimentDeta
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, clients); err != nil {
return err
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
// Creating the helper pod to perform docker-service-kill
if err = createHelperPod(experimentsDetails, clients, chaosDetails, experimentsDetails.TargetNode); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err = createHelperPod(ctx, experimentsDetails, clients, chaosDetails, experimentsDetails.TargetNode); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err = status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return err
}
}
// Checking for the node to be in not-ready state
log.Info("[Status]: Check for the node to be in NotReady state")
if err = status.CheckNodeNotReadyState(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("application node is not in NotReady state, err: %v", err)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
@ -106,7 +80,9 @@ func PrepareDockerServiceKill(experimentsDetails *experimentTypes.ExperimentDeta
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateDockerServiceKillFaultHelperPod")
defer span.End()
privileged := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
@ -115,7 +91,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, "", experimentsDetails.ExperimentName),
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
@ -187,8 +163,16 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
func ptrint64(p int64) *int64 {

View File

@ -1,18 +1,23 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
ebsloss "github.com/litmuschaos/litmus-go/chaoslib/litmus/ebs-loss/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
@ -20,8 +25,10 @@ var (
inject, abort chan os.Signal
)
//PrepareEBSLossByID contains the prepration and injection steps for the experiment
func PrepareEBSLossByID(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareEBSLossByID contains the prepration and injection steps for the experiment
func PrepareEBSLossByID(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEBSLossFaultByID")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -48,22 +55,22 @@ func PrepareEBSLossByID(experimentsDetails *experimentTypes.ExperimentDetails, c
//get the volume id or list of instance ids
volumeIDList := strings.Split(experimentsDetails.EBSVolumeID, ",")
if len(volumeIDList) == 0 {
return errors.Errorf("no volume id found to detach")
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no volume id found to detach"}
}
// watching for the abort signal and revert the chaos
go ebsloss.AbortWatcher(experimentsDetails, volumeIDList, abort, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = ebsloss.InjectChaosInSerialMode(experimentsDetails, volumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = ebsloss.InjectChaosInSerialMode(ctx, experimentsDetails, volumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = ebsloss.InjectChaosInParallelMode(experimentsDetails, volumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = ebsloss.InjectChaosInParallelMode(ctx, experimentsDetails, volumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection

View File

@ -1,18 +1,23 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
ebsloss "github.com/litmuschaos/litmus-go/chaoslib/litmus/ebs-loss/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
@ -20,8 +25,10 @@ var (
inject, abort chan os.Signal
)
//PrepareEBSLossByTag contains the prepration and injection steps for the experiment
func PrepareEBSLossByTag(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareEBSLossByTag contains the prepration and injection steps for the experiment
func PrepareEBSLossByTag(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEBSLossFaultByTag")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -53,15 +60,15 @@ func PrepareEBSLossByTag(experimentsDetails *experimentTypes.ExperimentDetails,
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = ebsloss.InjectChaosInSerialMode(experimentsDetails, targetEBSVolumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = ebsloss.InjectChaosInSerialMode(ctx, experimentsDetails, targetEBSVolumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = ebsloss.InjectChaosInParallelMode(experimentsDetails, targetEBSVolumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = ebsloss.InjectChaosInParallelMode(ctx, experimentsDetails, targetEBSVolumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {

View File

@ -1,22 +1,29 @@
package lib
import (
"context"
"fmt"
"os"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
ebs "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ebs"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
//InjectChaosInSerialMode will inject the ebs loss chaos in serial mode which means one after other
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetEBSVolumeIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// InjectChaosInSerialMode will inject the ebs loss chaos in serial mode which means one after other
func InjectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetEBSVolumeIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEBSLossFaultInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
@ -34,13 +41,13 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get volume attachment details
ec2InstanceID, device, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region)
if err != nil {
return errors.Errorf("fail to get the attachment info, err: %v", err)
return stacktrace.Propagate(err, "failed to get the attachment info")
}
//Detaching the ebs volume from the instance
log.Info("[Chaos]: Detaching the EBS volume from the instance")
if err = ebs.EBSVolumeDetach(volumeID, experimentsDetails.Region); err != nil {
return errors.Errorf("ebs detachment failed, err: %v", err)
return stacktrace.Propagate(err, "ebs detachment failed")
}
common.SetTargets(volumeID, "injected", "EBS", chaosDetails)
@ -48,14 +55,14 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Wait for ebs volume detachment
log.Infof("[Wait]: Wait for EBS volume detachment for volume %v", volumeID)
if err = ebs.WaitForVolumeDetachment(volumeID, ec2InstanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to detach the ebs volume to the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ebs detachment failed")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -66,7 +73,7 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Getting the EBS volume attachment status
ebsState, err := ebs.GetEBSStatus(volumeID, ec2InstanceID, experimentsDetails.Region)
if err != nil {
return errors.Errorf("failed to get the ebs status, err: %v", err)
return stacktrace.Propagate(err, "failed to get the ebs status")
}
switch ebsState {
@ -76,13 +83,13 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Attaching the ebs volume from the instance
log.Info("[Chaos]: Attaching the EBS volume back to the instance")
if err = ebs.EBSVolumeAttach(volumeID, ec2InstanceID, device, experimentsDetails.Region); err != nil {
return errors.Errorf("ebs attachment failed, err: %v", err)
return stacktrace.Propagate(err, "ebs attachment failed")
}
//Wait for ebs volume attachment
log.Infof("[Wait]: Wait for EBS volume attachment for %v volume", volumeID)
if err = ebs.WaitForVolumeAttachment(volumeID, ec2InstanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to attach the ebs volume to the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ebs attachment failed")
}
}
common.SetTargets(volumeID, "reverted", "EBS", chaosDetails)
@ -92,8 +99,10 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
return nil
}
//InjectChaosInParallelMode will inject the chaos in parallel mode that means all at once
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetEBSVolumeIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// InjectChaosInParallelMode will inject the chaos in parallel mode that means all at once
func InjectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetEBSVolumeIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEBSLossFaultInParallelMode")
defer span.End()
var ec2InstanceIDList, deviceList []string
@ -112,8 +121,15 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//prepare the instaceIDs and device name for all the given volume
for _, volumeID := range targetEBSVolumeIDList {
ec2InstanceID, device, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region)
if err != nil || ec2InstanceID == "" || device == "" {
return errors.Errorf("fail to get the attachment info, err: %v", err)
if err != nil {
return stacktrace.Propagate(err, "failed to get the attachment info")
}
if ec2InstanceID == "" || device == "" {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: "Volume not attached to any instance",
Target: fmt.Sprintf("EBS Volume ID: %v", volumeID),
}
}
ec2InstanceIDList = append(ec2InstanceIDList, ec2InstanceID)
deviceList = append(deviceList, device)
@ -123,28 +139,28 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Detaching the ebs volume from the instance
log.Info("[Chaos]: Detaching the EBS volume from the instance")
if err := ebs.EBSVolumeDetach(volumeID, experimentsDetails.Region); err != nil {
return errors.Errorf("ebs detachment failed, err: %v", err)
return stacktrace.Propagate(err, "ebs detachment failed")
}
common.SetTargets(volumeID, "injected", "EBS", chaosDetails)
}
log.Info("[Info]: Checking if the detachment process initiated")
if err := ebs.CheckEBSDetachmentInitialisation(targetEBSVolumeIDList, ec2InstanceIDList, experimentsDetails.Region); err != nil {
return errors.Errorf("fail to initialise the detachment")
return stacktrace.Propagate(err, "failed to initialise the detachment")
}
for i, volumeID := range targetEBSVolumeIDList {
//Wait for ebs volume detachment
log.Infof("[Wait]: Wait for EBS volume detachment for volume %v", volumeID)
if err := ebs.WaitForVolumeDetachment(volumeID, ec2InstanceIDList[i], experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to detach the ebs volume to the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ebs detachment failed")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -157,7 +173,7 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Getting the EBS volume attachment status
ebsState, err := ebs.GetEBSStatus(volumeID, ec2InstanceIDList[i], experimentsDetails.Region)
if err != nil {
return errors.Errorf("failed to get the ebs status, err: %v", err)
return stacktrace.Propagate(err, "failed to get the ebs status")
}
switch ebsState {
@ -167,13 +183,13 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Attaching the ebs volume from the instance
log.Info("[Chaos]: Attaching the EBS volume from the instance")
if err = ebs.EBSVolumeAttach(volumeID, ec2InstanceIDList[i], deviceList[i], experimentsDetails.Region); err != nil {
return errors.Errorf("ebs attachment failed, err: %v", err)
return stacktrace.Propagate(err, "ebs attachment failed")
}
//Wait for ebs volume attachment
log.Infof("[Wait]: Wait for EBS volume attachment for volume %v", volumeID)
if err = ebs.WaitForVolumeAttachment(volumeID, ec2InstanceIDList[i], experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to attach the ebs volume to the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ebs attachment failed")
}
}
common.SetTargets(volumeID, "reverted", "EBS", chaosDetails)
@ -193,13 +209,13 @@ func AbortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, volumeI
//Get volume attachment details
instanceID, deviceName, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region)
if err != nil {
log.Errorf("fail to get the attachment info, err: %v", err)
log.Errorf("Failed to get the attachment info: %v", err)
}
//Getting the EBS volume attachment status
ebsState, err := ebs.GetEBSStatus(experimentsDetails.EBSVolumeID, instanceID, experimentsDetails.Region)
if err != nil {
log.Errorf("failed to get the ebs status when an abort signal is received, err: %v", err)
log.Errorf("Failed to get the ebs status when an abort signal is received: %v", err)
}
if ebsState != "attached" {
@ -207,13 +223,13 @@ func AbortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, volumeI
//We first wait for the volume to get in detached state then we are attaching it.
log.Info("[Abort]: Wait for EBS complete volume detachment")
if err = ebs.WaitForVolumeDetachment(experimentsDetails.EBSVolumeID, instanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
log.Errorf("unable to detach the ebs volume, err: %v", err)
log.Errorf("Unable to detach the ebs volume: %v", err)
}
//Attaching the ebs volume from the instance
log.Info("[Chaos]: Attaching the EBS volume from the instance")
err = ebs.EBSVolumeAttach(experimentsDetails.EBSVolumeID, instanceID, deviceName, experimentsDetails.Region)
if err != nil {
log.Errorf("ebs attachment failed when an abort signal is received, err: %v", err)
log.Errorf("EBS attachment failed when an abort signal is received: %v", err)
}
}
common.SetTargets(volumeID, "reverted", "EBS", chaosDetails)

View File

@ -1,21 +1,26 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ec2-terminate-by-id/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
@ -23,8 +28,10 @@ var (
inject, abort chan os.Signal
)
//PrepareEC2TerminateByID contains the prepration and injection steps for the experiment
func PrepareEC2TerminateByID(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareEC2TerminateByID contains the prepration and injection steps for the experiment
func PrepareEC2TerminateByID(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEC2TerminateFaultByID")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -44,8 +51,8 @@ func PrepareEC2TerminateByID(experimentsDetails *experimentTypes.ExperimentDetai
//get the instance id or list of instance ids
instanceIDList := strings.Split(experimentsDetails.Ec2InstanceID, ",")
if len(instanceIDList) == 0 {
return errors.Errorf("no instance id found to terminate")
if experimentsDetails.Ec2InstanceID == "" || len(instanceIDList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no EC2 instance ID found to terminate"}
}
// watching for the abort signal and revert the chaos
@ -53,15 +60,15 @@ func PrepareEC2TerminateByID(experimentsDetails *experimentTypes.ExperimentDetai
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -72,8 +79,10 @@ func PrepareEC2TerminateByID(experimentsDetails *experimentTypes.ExperimentDetai
return nil
}
//injectChaosInSerialMode will inject the ec2 instance termination in serial mode that is one after other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInSerialMode will inject the ec2 instance termination in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByIDInSerialMode")
defer span.End()
select {
case <-inject:
@ -100,7 +109,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to stop, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "injected", "EC2", chaosDetails)
@ -108,14 +117,14 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to stop the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -127,13 +136,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
if experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to start, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
//Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to start the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
common.SetTargets(id, "reverted", "EC2", chaosDetails)
@ -145,7 +154,9 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode will inject the ec2 instance termination in parallel mode that is all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByIDInParallelMode")
defer span.End()
select {
case <-inject:
@ -171,7 +182,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to stop, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "injected", "EC2", chaosDetails)
}
@ -180,15 +191,15 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to stop the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "reverted", "EC2 Instance ID", chaosDetails)
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -202,7 +213,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for _, id := range instanceIDList {
log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to start, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
@ -210,7 +221,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to start the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
}
@ -232,19 +243,19 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanc
for _, id := range instanceIDList {
instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region)
if err != nil {
log.Errorf("fail to get instance status when an abort signal is received,err :%v", err)
log.Errorf("Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "running" && experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Abort]: Waiting for the EC2 instance to get down")
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
log.Errorf("unable to wait till stop of the instance, err: %v", err)
log.Errorf("Unable to wait till stop of the instance: %v", err)
}
log.Info("[Abort]: Starting EC2 instance as abort signal received")
err := awslib.EC2Start(id, experimentsDetails.Region)
if err != nil {
log.Errorf("ec2 instance failed to start when an abort signal is received, err: %v", err)
log.Errorf("EC2 instance failed to start when an abort signal is received: %v", err)
}
}
common.SetTargets(id, "reverted", "EC2", chaosDetails)

View File

@ -1,28 +1,35 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ec2-terminate-by-tag/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
var inject, abort chan os.Signal
//PrepareEC2TerminateByTag contains the prepration and injection steps for the experiment
func PrepareEC2TerminateByTag(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareEC2TerminateByTag contains the prepration and injection steps for the experiment
func PrepareEC2TerminateByTag(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEC2TerminateFaultByTag")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -48,15 +55,15 @@ func PrepareEC2TerminateByTag(experimentsDetails *experimentTypes.ExperimentDeta
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err := injectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err := injectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -67,8 +74,10 @@ func PrepareEC2TerminateByTag(experimentsDetails *experimentTypes.ExperimentDeta
return nil
}
//injectChaosInSerialMode will inject the ce2 instance termination in serial mode that is one after other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInSerialMode will inject the ce2 instance termination in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByTagInSerialMode")
defer span.End()
select {
case <-inject:
@ -95,7 +104,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to stop, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "injected", "EC2", chaosDetails)
@ -103,14 +112,14 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to stop the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -122,13 +131,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
if experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to start, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
//Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to start the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
common.SetTargets(id, "reverted", "EC2", chaosDetails)
@ -140,7 +149,9 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode will inject the ce2 instance termination in parallel mode that is all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByTagInParallelMode")
defer span.End()
select {
case <-inject:
@ -165,7 +176,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to stop, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "injected", "EC2", chaosDetails)
}
@ -174,14 +185,14 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to stop the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -195,7 +206,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for _, id := range instanceIDList {
log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to start, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
@ -203,7 +214,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to start the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
}
@ -216,21 +227,24 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
return nil
}
//SetTargetInstance will select the target instance which are in running state and filtered from the given instance tag
// SetTargetInstance will select the target instance which are in running state and filtered from the given instance tag
func SetTargetInstance(experimentsDetails *experimentTypes.ExperimentDetails) error {
instanceIDList, err := awslib.GetInstanceList(experimentsDetails.InstanceTag, experimentsDetails.Region)
instanceIDList, err := awslib.GetInstanceList(experimentsDetails.Ec2InstanceTag, experimentsDetails.Region)
if err != nil {
return err
return stacktrace.Propagate(err, "failed to get the instance id list")
}
if len(instanceIDList) == 0 {
return errors.Errorf("no instance found with the given tag %v, in region %v", experimentsDetails.InstanceTag, experimentsDetails.Region)
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeTargetSelection,
Reason: fmt.Sprintf("no instance found with the given tag %v, in region %v", experimentsDetails.Ec2InstanceTag, experimentsDetails.Region),
}
}
for _, id := range instanceIDList {
instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region)
if err != nil {
return errors.Errorf("fail to get the instance status while selecting the target instances, err: %v", err)
return stacktrace.Propagate(err, "failed to get the instance status while selecting the target instances")
}
if instanceState == "running" {
experimentsDetails.TargetInstanceIDList = append(experimentsDetails.TargetInstanceIDList, id)
@ -238,7 +252,10 @@ func SetTargetInstance(experimentsDetails *experimentTypes.ExperimentDetails) er
}
if len(experimentsDetails.TargetInstanceIDList) == 0 {
return errors.Errorf("fail to get any running instance having instance tag: %v", experimentsDetails.InstanceTag)
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: "failed to get any running instance",
Target: fmt.Sprintf("EC2 Instance Tag: %v", experimentsDetails.Ec2InstanceTag)}
}
log.InfoWithValues("[Info]: Targeting the running instances filtered from instance tag", logrus.Fields{
@ -257,19 +274,19 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanc
for _, id := range instanceIDList {
instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region)
if err != nil {
log.Errorf("fail to get instance status when an abort signal is received,err :%v", err)
log.Errorf("Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "running" && experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Abort]: Waiting for the EC2 instance to get down")
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
log.Errorf("unable to wait till stop of the instance, err: %v", err)
log.Errorf("Unable to wait till stop of the instance: %v", err)
}
log.Info("[Abort]: Starting EC2 instance as abort signal received")
err := awslib.EC2Start(id, experimentsDetails.Region)
if err != nil {
log.Errorf("ec2 instance failed to start when an abort signal is received, err: %v", err)
log.Errorf("EC2 instance failed to start when an abort signal is received: %v", err)
}
}
common.SetTargets(id, "reverted", "EC2", chaosDetails)

View File

@ -0,0 +1,312 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-disk-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareDiskVolumeLossByLabel contains the prepration and injection steps for the experiment
func PrepareDiskVolumeLossByLabel(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareGCPDiskVolumeLossFaultByLabel")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
diskVolumeNamesList := common.FilterBasedOnPercentage(experimentsDetails.DiskAffectedPerc, experimentsDetails.TargetDiskVolumeNamesList)
if err := getDeviceNamesAndVMInstanceNames(diskVolumeNamesList, computeService, experimentsDetails); err != nil {
return err
}
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// watching for the abort signal and revert the chaos
go abortWatcher(computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, abort, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the disk loss chaos in serial mode which means one after the other
func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPDiskVolumeLossFaultByLabelInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on VM instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i := range targetDiskVolumeNamesList {
//Detaching the disk volume from the instance
log.Info("[Chaos]: Detaching the disk volume from the instance")
if err = gcp.DiskVolumeDetach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk detachment failed")
}
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
//Wait for disk volume detachment
log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to detach the disk volume from the vm instance")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
//Wait for chaos duration
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone)
if err != nil {
return stacktrace.Propagate(err, "failed to get the disk volume status")
}
switch diskState {
case "attached":
log.Info("[Skip]: The disk volume is already attached")
default:
//Attaching the disk volume to the instance
log.Info("[Chaos]: Attaching the disk volume back to the instance")
if err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk attachment failed")
}
//Wait for disk volume attachment
log.Infof("[Wait]: Wait for disk volume attachment for %v volume", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to attach the disk volume to the vm instance")
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
// injectChaosInParallelMode will inject the disk loss chaos in parallel mode that means all at once
func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPDiskVolumeLossFaultByLabelInParallelMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on vm instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i := range targetDiskVolumeNamesList {
//Detaching the disk volume from the instance
log.Info("[Chaos]: Detaching the disk volume from the instance")
if err = gcp.DiskVolumeDetach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk detachment failed")
}
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
}
for i := range targetDiskVolumeNamesList {
//Wait for disk volume detachment
log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to detach the disk volume from the vm instance")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
for i := range targetDiskVolumeNamesList {
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone)
if err != nil {
return stacktrace.Propagate(err, "failed to get the disk status")
}
switch diskState {
case "attached":
log.Info("[Skip]: The disk volume is already attached")
default:
//Attaching the disk volume to the instance
log.Info("[Chaos]: Attaching the disk volume to the instance")
if err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk attachment failed")
}
//Wait for disk volume attachment
log.Infof("[Wait]: Wait for disk volume attachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to attach the disk volume to the vm instance")
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
// AbortWatcher will watching for the abort signal and revert the chaos
func abortWatcher(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, abort chan os.Signal, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for i := range targetDiskVolumeNamesList {
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone)
if err != nil {
log.Errorf("Failed to get %s disk state when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
}
if diskState != "attached" {
//Wait for disk volume detachment
//We first wait for the volume to get in detached state then we are attaching it.
log.Infof("[Abort]: Wait for %s complete disk volume detachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
log.Errorf("Unable to detach %s disk volume, err: %v", targetDiskVolumeNamesList[i], err)
}
//Attaching the disk volume from the instance
log.Infof("[Chaos]: Attaching %s disk volume to the instance", targetDiskVolumeNamesList[i])
err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i])
if err != nil {
log.Errorf("%s disk attachment failed when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}
// getDeviceNamesAndVMInstanceNames fetches the device name and attached VM instance name for each target disk
func getDeviceNamesAndVMInstanceNames(diskVolumeNamesList []string, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails) error {
for i := range diskVolumeNamesList {
instanceName, err := gcp.GetVolumeAttachmentDetails(computeService, experimentsDetails.GCPProjectID, experimentsDetails.Zones, diskVolumeNamesList[i])
if err != nil || instanceName == "" {
return stacktrace.Propagate(err, "failed to get the disk attachment info")
}
deviceName, err := gcp.GetDiskDeviceNameForVM(computeService, diskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones, instanceName)
if err != nil {
return stacktrace.Propagate(err, "failed to fetch the disk device name")
}
experimentsDetails.TargetDiskInstanceNamesList = append(experimentsDetails.TargetDiskInstanceNamesList, instanceName)
experimentsDetails.DeviceNamesList = append(experimentsDetails.DeviceNamesList, deviceName)
}
return nil
}

View File

@ -1,21 +1,28 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
gcp "github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-disk-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"github.com/pkg/errors"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1"
)
var (
@ -23,10 +30,10 @@ var (
inject, abort chan os.Signal
)
//PrepareDiskVolumeLoss contains the prepration and injection steps for the experiment
func PrepareDiskVolumeLoss(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
var instanceNamesList []string
// PrepareDiskVolumeLoss contains the prepration and injection steps for the experiment
func PrepareDiskVolumeLoss(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareVMDiskLossFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -46,30 +53,13 @@ func PrepareDiskVolumeLoss(experimentsDetails *experimentTypes.ExperimentDetails
//get the disk volume names list
diskNamesList := strings.Split(experimentsDetails.DiskVolumeNames, ",")
if len(diskNamesList) == 0 {
return errors.Errorf("no volumes found to detach")
}
//get the disk zones list
diskZonesList := strings.Split(experimentsDetails.DiskZones, ",")
if len(diskZonesList) == 0 {
return errors.Errorf("no zones found for corressponding instances")
}
diskZonesList := strings.Split(experimentsDetails.Zones, ",")
if len(diskNamesList) != len(diskZonesList) {
return errors.Errorf("unequal number of disk names and zones received")
}
//prepare the instace names for the given disks
for i := range diskNamesList {
//Get volume attachment details
instanceName, err := gcp.GetVolumeAttachmentDetails(experimentsDetails.GCPProjectID, diskZonesList[i], diskNamesList[i])
if err != nil || instanceName == "" {
return errors.Errorf("failed to get the attachment info, err: %v", err)
}
instanceNamesList = append(instanceNamesList, instanceName)
//get the device names for the given disks
if err := getDeviceNamesList(computeService, experimentsDetails, diskNamesList, diskZonesList); err != nil {
return stacktrace.Propagate(err, "failed to fetch the disk device names")
}
select {
@ -79,40 +69,39 @@ func PrepareDiskVolumeLoss(experimentsDetails *experimentTypes.ExperimentDetails
default:
// watching for the abort signal and revert the chaos
go AbortWatcher(experimentsDetails, diskNamesList, instanceNamesList, abort, chaosDetails)
go abortWatcher(computeService, experimentsDetails, diskNamesList, diskZonesList, abort, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, diskNamesList, instanceNamesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, computeService, experimentsDetails, diskNamesList, diskZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, diskNamesList, instanceNamesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, computeService, experimentsDetails, diskNamesList, diskZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
//injectChaosInSerialMode will inject the disk loss chaos in serial mode which means one after the other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList []string, instanceNamesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInSerialMode will inject the disk loss chaos in serial mode which means one after the other
func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMDiskLossFaultInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
diskZonesList := strings.Split(experimentsDetails.DiskZones, ",")
deviceNamesList := strings.Split(experimentsDetails.DeviceNames, ",")
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
@ -123,23 +112,23 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
for i := range targetDiskVolumeNamesList {
//Detaching the disk volume from the instance
log.Info("[Chaos]: Detaching the disk volume from the instance")
if err = gcp.DiskVolumeDetach(instanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], deviceNamesList[i]); err != nil {
return errors.Errorf("disk detachment failed, err: %v", err)
log.Infof("[Chaos]: Detaching %s disk volume from the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeDetach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk detachment failed")
}
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
//Wait for disk volume detachment
log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to detach the disk volume from the vm instance, err: %v", err)
log.Infof("[Wait]: Wait for %s disk volume detachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to detach disk volume from the vm instance")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -149,25 +138,25 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
common.WaitForDuration(experimentsDetails.ChaosInterval)
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], diskZonesList[i])
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i])
if err != nil {
return errors.Errorf("failed to get the disk volume status, err: %v", err)
return stacktrace.Propagate(err, fmt.Sprintf("failed to get %s disk volume status", targetDiskVolumeNamesList[i]))
}
switch diskState {
case "attached":
log.Info("[Skip]: The disk volume is already attached")
log.Infof("[Skip]: %s disk volume is already attached", targetDiskVolumeNamesList[i])
default:
//Attaching the disk volume to the instance
log.Info("[Chaos]: Attaching the disk volume back to the instance")
if err = gcp.DiskVolumeAttach(instanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], deviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return errors.Errorf("disk attachment failed, err: %v", err)
log.Infof("[Chaos]: Attaching %s disk volume back to the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk attachment failed")
}
//Wait for disk volume attachment
log.Infof("[Wait]: Wait for disk volume attachment for %v volume", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to attach the disk volume to the vm instance, err: %v", err)
log.Infof("[Wait]: Wait for %s disk volume attachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to attach disk volume to the vm instance")
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
@ -177,11 +166,10 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
return nil
}
//injectChaosInParallelMode will inject the disk loss chaos in parallel mode that means all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList []string, instanceNamesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
diskZonesList := strings.Split(experimentsDetails.DiskZones, ",")
deviceNamesList := strings.Split(experimentsDetails.DeviceNames, ",")
// injectChaosInParallelMode will inject the disk loss chaos in parallel mode that means all at once
func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMDiskLossFaultInParallelMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
@ -198,9 +186,9 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for i := range targetDiskVolumeNamesList {
//Detaching the disk volume from the instance
log.Info("[Chaos]: Detaching the disk volume from the instance")
if err = gcp.DiskVolumeDetach(instanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], deviceNamesList[i]); err != nil {
return errors.Errorf("disk detachment failed, err: %v", err)
log.Infof("[Chaos]: Detaching %s disk volume from the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeDetach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk detachment failed")
}
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
@ -209,15 +197,15 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for i := range targetDiskVolumeNamesList {
//Wait for disk volume detachment
log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to detach the disk volume from the vm instance, err: %v", err)
log.Infof("[Wait]: Wait for %s disk volume detachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to detach disk volume from the vm instance")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -229,25 +217,25 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for i := range targetDiskVolumeNamesList {
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], diskZonesList[i])
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i])
if err != nil {
return errors.Errorf("failed to get the disk status, err: %v", err)
}
switch diskState {
case "attached":
log.Info("[Skip]: The disk volume is already attached")
log.Infof("[Skip]: %s disk volume is already attached", targetDiskVolumeNamesList[i])
default:
//Attaching the disk volume to the instance
log.Info("[Chaos]: Attaching the disk volume to the instance")
if err = gcp.DiskVolumeAttach(instanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], deviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return errors.Errorf("disk attachment failed, err: %v", err)
log.Infof("[Chaos]: Attaching %s disk volume to the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk attachment failed")
}
//Wait for disk volume attachment
log.Infof("[Wait]: Wait for disk volume attachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to attach the disk volume to the vm instance, err: %v", err)
log.Infof("[Wait]: Wait for %s disk volume attachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to attach disk volume to the vm instance")
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
@ -258,45 +246,58 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
}
// AbortWatcher will watching for the abort signal and revert the chaos
func AbortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, diskNamesList []string, instanceNamesList []string, abort chan os.Signal, chaosDetails *types.ChaosDetails) {
func abortWatcher(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, abort chan os.Signal, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
diskZonesList := strings.Split(experimentsDetails.DiskZones, ",")
deviceNamesList := strings.Split(experimentsDetails.DeviceNames, ",")
for i := range diskNamesList {
for i := range targetDiskVolumeNamesList {
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(diskNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], diskZonesList[i])
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i])
if err != nil {
log.Errorf("failed to get the disk state when an abort signal is received, err: %v", err)
log.Errorf("Failed to get %s disk state when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
}
if diskState != "attached" {
//Wait for disk volume detachment
//We first wait for the volume to get in detached state then we are attaching it.
log.Info("[Abort]: Wait for complete disk volume detachment")
log.Infof("[Abort]: Wait for complete disk volume detachment for %s", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(diskNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
log.Errorf("unable to detach the disk volume, err: %v", err)
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
log.Errorf("Unable to detach %s disk volume, err: %v", targetDiskVolumeNamesList[i], err)
}
//Attaching the disk volume from the instance
log.Info("[Chaos]: Attaching the disk volume from the instance")
log.Infof("[Chaos]: Attaching %s disk volume from the instance", targetDiskVolumeNamesList[i])
err = gcp.DiskVolumeAttach(instanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], deviceNamesList[i], diskNamesList[i])
err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i])
if err != nil {
log.Errorf("disk attachment failed when an abort signal is received, err: %v", err)
log.Errorf("%s disk attachment failed when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
}
}
common.SetTargets(diskNamesList[i], "reverted", "DiskVolume", chaosDetails)
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}
// getDeviceNamesList fetches the device names for the target disks
func getDeviceNamesList(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, diskNamesList, diskZonesList []string) error {
for i := range diskNamesList {
deviceName, err := gcp.GetDiskDeviceNameForVM(computeService, diskNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.TargetDiskInstanceNamesList[i])
if err != nil {
return err
}
experimentsDetails.DeviceNamesList = append(experimentsDetails.DeviceNamesList, deviceName)
}
return nil
}

View File

@ -0,0 +1,293 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
gcplib "github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1"
)
var inject, abort chan os.Signal
// PrepareVMStopByLabel executes the experiment steps by injecting chaos into target VM instances
func PrepareVMStopByLabel(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareGCPVMInstanceStopFaultByLabel")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
instanceNamesList := common.FilterBasedOnPercentage(experimentsDetails.InstanceAffectedPerc, experimentsDetails.TargetVMInstanceNameList)
log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceNamesList))
// watching for the abort signal and revert the chaos
go abortWatcher(computeService, experimentsDetails, instanceNamesList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(ctx, computeService, experimentsDetails, instanceNamesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(ctx, computeService, experimentsDetails, instanceNamesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stops VM instances in serial mode i.e. one after the other
func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPVMInstanceStopFaultByLabelInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target VM instance list, %v", instanceNamesList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Stop the instance
for i := range instanceNamesList {
//Stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "VM instance failed to stop")
}
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
//Wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to stop", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// wait for the chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
switch experimentsDetails.ManagedInstanceGroup {
case "enable":
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in RUNNING state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "unable to start %s vm instance")
}
default:
// starting the VM instance
log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "vm instance failed to start")
}
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in RUNNING state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "unable to start %s vm instance")
}
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode will inject the VM instance termination in serial mode that is one after other
func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPVMInstanceStopFaultByLabelInParallelMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target VM instance list, %v", instanceNamesList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// power-off the instance
for i := range instanceNamesList {
// stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "vm instance failed to stop")
}
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
}
for i := range instanceNamesList {
// wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
switch experimentsDetails.ManagedInstanceGroup {
case "enable":
// wait for VM instance to get in running state
for i := range instanceNamesList {
log.Infof("[Wait]: Wait for VM instance '%v' to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "unable to start the vm instance")
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
default:
// starting the VM instance
for i := range instanceNamesList {
log.Info("[Chaos]: Starting back the VM instance")
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "vm instance failed to start")
}
}
// wait for VM instance to get in running state
for i := range instanceNamesList {
log.Infof("[Wait]: Wait for VM instance '%v' to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "unable to start the vm instance")
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// abortWatcher watches for the abort signal and reverts the chaos
func abortWatcher(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for i := range instanceNamesList {
instanceState, err := gcplib.GetVMInstanceStatus(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones)
if err != nil {
log.Errorf("Failed to get %s instance status when an abort signal is received, err: %v", instanceNamesList[i], err)
}
if instanceState != "RUNNING" && experimentsDetails.ManagedInstanceGroup != "enable" {
log.Info("[Abort]: Waiting for the VM instance to shut down")
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
log.Errorf("Unable to wait till stop of %s instance, err: %v", instanceNamesList[i], err)
}
log.Info("[Abort]: Starting VM instance as abort signal received")
err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones)
if err != nil {
log.Errorf("%s instance failed to start when an abort signal is received, err: %v", instanceNamesList[i], err)
}
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -1,21 +1,27 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
gcplib "github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1"
)
var (
@ -23,8 +29,10 @@ var (
inject, abort chan os.Signal
)
//PrepareVMStop contains the prepration and injection steps for the experiment
func PrepareVMStop(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareVMStop contains the prepration and injection steps for the experiment
func PrepareVMStop(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareVMInstanceStopFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -44,33 +52,23 @@ func PrepareVMStop(experimentsDetails *experimentTypes.ExperimentDetails, client
// get the instance name or list of instance names
instanceNamesList := strings.Split(experimentsDetails.VMInstanceName, ",")
if len(instanceNamesList) == 0 {
return errors.Errorf("no instance name found to stop")
}
// get the zone name or list of corresponding zones for the instances
instanceZonesList := strings.Split(experimentsDetails.InstanceZone, ",")
if len(instanceZonesList) == 0 {
return errors.Errorf("no corresponding zones found for the instances")
}
instanceZonesList := strings.Split(experimentsDetails.Zones, ",")
if len(instanceNamesList) != len(instanceZonesList) {
return errors.Errorf("number of instances is not equal to the number of zones")
}
go abortWatcher(experimentsDetails, instanceNamesList, instanceZonesList, chaosDetails)
go abortWatcher(computeService, experimentsDetails, instanceNamesList, instanceZonesList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, instanceNamesList, instanceZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, computeService, experimentsDetails, instanceNamesList, instanceZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, instanceNamesList, instanceZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, computeService, experimentsDetails, instanceNamesList, instanceZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// wait for the ramp time after chaos injection
@ -78,11 +76,14 @@ func PrepareVMStop(experimentsDetails *experimentTypes.ExperimentDetails, client
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
//injectChaosInSerialMode stops VM instances in serial mode i.e. one after the other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, instanceZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInSerialMode stops VM instances in serial mode i.e. one after the other
func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, instanceZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMInstanceStopFaultInSerialMode")
defer span.End()
select {
case <-inject:
@ -95,7 +96,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceNames list, %v", instanceNamesList)
log.Infof("[Info]: Target instance list, %v", instanceNamesList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM instance"
@ -107,23 +108,23 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
for i := range instanceNamesList {
//Stopping the VM instance
log.Info("[Chaos]: Stopping the desired VM instance")
if err := gcplib.VMInstanceStop(instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("VM instance failed to stop, err: %v", err)
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to stop")
}
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
//Wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance '%v' to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("vm instance failed to fully shutdown, err: %v", err)
log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -132,29 +133,44 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
// starting the VM instance
if experimentsDetails.AutoScalingGroup != "enable" {
log.Info("[Chaos]: Starting back the VM instance")
if err := gcplib.VMInstanceStart(instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("vm instance failed to start, err: %v", err)
switch experimentsDetails.ManagedInstanceGroup {
case "disable":
// starting the VM instance
log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to start")
}
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance '%v' to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("unable to start the vm instance, err: %v", err)
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "unable to start vm instance")
}
default:
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "unable to start vm instance")
}
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode stops VM instances in parallel mode i.e. all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, instanceZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, instanceZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMInstanceStopFaultInParallelMode")
defer span.End()
select {
case <-inject:
@ -167,7 +183,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceID list, %v", instanceNamesList)
log.Infof("[Info]: Target VM instance list, %v", instanceNamesList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM instance"
@ -179,9 +195,9 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for i := range instanceNamesList {
// stopping the VM instance
log.Info("[Chaos]: Stopping the desired VM instance")
if err := gcplib.VMInstanceStop(instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("vm instance failed to stop, err: %v", err)
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to stop")
}
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
@ -190,15 +206,15 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for i := range instanceNamesList {
// wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance '%v' to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("vm instance failed to fully shutdown, err: %v", err)
log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -207,59 +223,82 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
// starting the VM instance
if experimentsDetails.AutoScalingGroup != "enable" {
switch experimentsDetails.ManagedInstanceGroup {
case "disable":
// starting the VM instance
for i := range instanceNamesList {
log.Info("[Chaos]: Starting back the VM instance")
if err := gcplib.VMInstanceStart(instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("vm instance failed to start, err: %v", err)
log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to start")
}
}
// wait for VM instance to get in running state
for i := range instanceNamesList {
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance '%v' to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("unable to start the vm instance, err: %v", err)
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "unable to start vm instance")
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
default:
// wait for VM instance to get in running state
for i := range instanceNamesList {
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "unable to start vm instance")
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
}
for i := range instanceNamesList {
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// abortWatcher watches for the abort signal and reverts the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, zonesList []string, chaosDetails *types.ChaosDetails) {
func abortWatcher(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, zonesList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for i := range instanceNamesList {
instanceState, err := gcplib.GetVMInstanceStatus(instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i])
if err != nil {
log.Errorf("fail to get instance status when an abort signal is received,err :%v", err)
}
if instanceState != "RUNNING" && experimentsDetails.AutoScalingGroup != "enable" {
log.Info("[Abort]: Waiting for the VM instance to shut down")
if err := gcplib.WaitForVMInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i]); err != nil {
log.Errorf("unable to wait till stop of the instance, err: %v", err)
}
if experimentsDetails.ManagedInstanceGroup != "enable" {
log.Info("[Abort]: Starting VM instance as abort signal received")
err := gcplib.VMInstanceStart(instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i])
for i := range instanceNamesList {
instanceState, err := gcplib.GetVMInstanceStatus(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i])
if err != nil {
log.Errorf("vm instance failed to start when an abort signal is received, err: %v", err)
log.Errorf("Failed to get %s vm instance status when an abort signal is received, err: %v", instanceNamesList[i], err)
}
if instanceState != "RUNNING" {
log.Infof("[Abort]: Waiting for %s VM instance to shut down", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i]); err != nil {
log.Errorf("Unable to wait till stop of %s instance, err: %v", instanceNamesList[i], err)
}
log.Infof("[Abort]: Starting %s VM instance as abort signal is received", instanceNamesList[i])
err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i])
if err != nil {
log.Errorf("%s VM instance failed to start when an abort signal is received, err: %v", instanceNamesList[i], err)
}
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -0,0 +1,332 @@
package helper
import (
"context"
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"os"
"os/signal"
"strconv"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
clientTypes "k8s.io/apimachinery/pkg/types"
)
var (
err error
inject, abort chan os.Signal
)
// Helper injects the http chaos
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodHTTPFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
resultDetails := types.ResultDetails{}
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Fetching all the ENV passed for the helper pod
log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails)
// Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Initialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
err := prepareK8sHttpChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails)
if err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
// prepareK8sHttpChaos contains the preparation steps before chaos injection
func prepareK8sHttpChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not parse targets")
}
var targets []targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
td.ContainerId, err = common.GetRuntimeBasedContainerID(experimentsDetails.ContainerRuntime, experimentsDetails.SocketPath, td.Name, td.Namespace, td.TargetContainer, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the pid of the target container
td.Pid, err = common.GetPauseAndSandboxPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
}
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos
go abortWatcher(targets, resultDetails.Name, chaosDetails.ChaosNamespace, experimentsDetails)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
for _, t := range targets {
// injecting http chaos inside target container
if err = injectChaos(experimentsDetails, t); err != nil {
return stacktrace.Propagate(err, "could not inject chaos")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertChaos(experimentsDetails, t); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: chaos duration is over, reverting chaos")
var errList []string
for _, t := range targets {
// cleaning the ip rules process after chaos injection
err := revertChaos(experimentsDetails, t)
if err != nil {
errList = append(errList, err.Error())
continue
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// injectChaos inject the http chaos in target container and add ruleset to the iptables to redirect the ports
func injectChaos(experimentDetails *experimentTypes.ExperimentDetails, t targetDetails) error {
if err := startProxy(experimentDetails, t.Pid); err != nil {
killErr := killProxy(t.Pid, t.Source)
if killErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(killErr).Error())}
}
return stacktrace.Propagate(err, "could not start proxy server")
}
if err := addIPRuleSet(experimentDetails, t.Pid); err != nil {
killErr := killProxy(t.Pid, t.Source)
if killErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(killErr).Error())}
}
return stacktrace.Propagate(err, "could not add ip rules")
}
return nil
}
// revertChaos revert the http chaos in target container
func revertChaos(experimentDetails *experimentTypes.ExperimentDetails, t targetDetails) error {
var errList []string
if err := removeIPRuleSet(experimentDetails, t.Pid); err != nil {
errList = append(errList, err.Error())
}
if err := killProxy(t.Pid, t.Source); err != nil {
errList = append(errList, err.Error())
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
return nil
}
// startProxy starts the proxy process inside the target container
// it is using nsenter command to enter into network namespace of target container
// and execute the proxy related command inside it.
func startProxy(experimentDetails *experimentTypes.ExperimentDetails, pid int) error {
toxics := os.Getenv("TOXIC_COMMAND")
// starting toxiproxy server inside the target container
startProxyServerCommand := fmt.Sprintf("(sudo nsenter -t %d -n toxiproxy-server -host=0.0.0.0 > /dev/null 2>&1 &)", pid)
// Creating a proxy for the targeted service in the target container
createProxyCommand := fmt.Sprintf("(sudo nsenter -t %d -n toxiproxy-cli create -l 0.0.0.0:%d -u 0.0.0.0:%d proxy)", pid, experimentDetails.ProxyPort, experimentDetails.TargetServicePort)
createToxicCommand := fmt.Sprintf("(sudo nsenter -t %d -n toxiproxy-cli toxic add %s --toxicity %f proxy)", pid, toxics, float32(experimentDetails.Toxicity)/100.0)
// sleep 2 is added for proxy-server to be ready for creating proxy and adding toxics
chaosCommand := fmt.Sprintf("%s && sleep 2 && %s && %s", startProxyServerCommand, createProxyCommand, createToxicCommand)
log.Infof("[Chaos]: Starting proxy server")
if err := common.RunBashCommand(chaosCommand, "failed to start proxy server", experimentDetails.ChaosPodName); err != nil {
return err
}
log.Info("[Info]: Proxy started successfully")
return nil
}
const NoProxyToKill = "you need to specify whom to kill"
// killProxy kills the proxy process inside the target container
// it is using nsenter command to enter into network namespace of target container
// and execute the proxy related command inside it.
func killProxy(pid int, source string) error {
stopProxyServerCommand := fmt.Sprintf("sudo nsenter -t %d -n sudo kill -9 $(ps aux | grep [t]oxiproxy | awk 'FNR==2{print $2}')", pid)
log.Infof("[Chaos]: Stopping proxy server")
if err := common.RunBashCommand(stopProxyServerCommand, "failed to stop proxy server", source); err != nil {
return err
}
log.Info("[Info]: Proxy stopped successfully")
return nil
}
// addIPRuleSet adds the ip rule set to iptables in target container
// it is using nsenter command to enter into network namespace of target container
// and execute the iptables related command inside it.
func addIPRuleSet(experimentDetails *experimentTypes.ExperimentDetails, pid int) error {
// it adds the proxy port REDIRECT iprule in the beginning of the PREROUTING table
// so that it always matches all the incoming packets for the matching target port filters and
// if matches then it redirect the request to the proxy port
addIPRuleSetCommand := fmt.Sprintf("(sudo nsenter -t %d -n iptables -t nat -I PREROUTING -i %v -p tcp --dport %d -j REDIRECT --to-port %d)", pid, experimentDetails.NetworkInterface, experimentDetails.TargetServicePort, experimentDetails.ProxyPort)
log.Infof("[Chaos]: Adding IPtables ruleset")
if err := common.RunBashCommand(addIPRuleSetCommand, "failed to add ip rules", experimentDetails.ChaosPodName); err != nil {
return err
}
log.Info("[Info]: IP rule set added successfully")
return nil
}
const NoIPRulesetToRemove = "No chain/target/match by that name"
// removeIPRuleSet removes the ip rule set from iptables in target container
// it is using nsenter command to enter into network namespace of target container
// and execute the iptables related command inside it.
func removeIPRuleSet(experimentDetails *experimentTypes.ExperimentDetails, pid int) error {
removeIPRuleSetCommand := fmt.Sprintf("sudo nsenter -t %d -n iptables -t nat -D PREROUTING -i %v -p tcp --dport %d -j REDIRECT --to-port %d", pid, experimentDetails.NetworkInterface, experimentDetails.TargetServicePort, experimentDetails.ProxyPort)
log.Infof("[Chaos]: Removing IPtables ruleset")
if err := common.RunBashCommand(removeIPRuleSetCommand, "failed to remove ip rules", experimentDetails.ChaosPodName); err != nil {
return err
}
log.Info("[Info]: IP rule set removed successfully")
return nil
}
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", ""))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
experimentDetails.NetworkInterface = types.Getenv("NETWORK_INTERFACE", "")
experimentDetails.TargetServicePort, _ = strconv.Atoi(types.Getenv("TARGET_SERVICE_PORT", ""))
experimentDetails.ProxyPort, _ = strconv.Atoi(types.Getenv("PROXY_PORT", ""))
experimentDetails.Toxicity, _ = strconv.Atoi(types.Getenv("TOXICITY", "100"))
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targets []targetDetails, resultName, chaosNS string, experimentDetails *experimentTypes.ExperimentDetails) {
<-abort
log.Info("[Abort]: Killing process started because of terminated signal received")
log.Info("[Abort]: Chaos Revert Started")
retry := 3
for retry > 0 {
for _, t := range targets {
if err = revertChaos(experimentDetails, t); err != nil {
if strings.Contains(err.Error(), NoIPRulesetToRemove) && strings.Contains(err.Error(), NoProxyToKill) {
continue
}
log.Errorf("unable to revert for %v pod, err :%v", t.Name, err)
continue
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult for %v pod, err :%v", t.Name, err)
}
}
retry--
time.Sleep(1 * time.Second)
}
log.Info("Chaos Revert Completed")
os.Exit(1)
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
ContainerId string
Pid int
Source string
}

View File

@ -0,0 +1,37 @@
package header
import (
"context"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
// PodHttpModifyHeaderChaos contains the steps to prepare and inject http modify header chaos
func PodHttpModifyHeaderChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHTTPModifyHeaderFault")
defer span.End()
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
"Listen Port": experimentsDetails.ProxyPort,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Toxicity": experimentsDetails.Toxicity,
"Headers": experimentsDetails.HeadersMap,
"Header Mode": experimentsDetails.HeaderMode,
})
stream := "downstream"
if experimentsDetails.HeaderMode == "request" {
stream = "upstream"
}
args := "-t header --" + stream + " -a headers='" + (experimentsDetails.HeadersMap) + "' -a mode=" + experimentsDetails.HeaderMode
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -0,0 +1,266 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error {
var err error
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
//set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, args, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, args, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode inject the http chaos in all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, args string, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodHTTPFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform http chaos
for _, pod := range targetPodList.Items {
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode inject the http chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, args string, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodHTTPFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID, args string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateHTTPChaosHelperPod")
defer span.End()
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
Volumes: []apiv1.Volume{
{
Name: "cri-socket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
},
Args: []string{
"-c",
"./helpers -name http-chaos",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(ctx, experimentsDetails, targets, args),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
MountPath: experimentsDetails.SocketPath,
},
},
SecurityContext: &apiv1.SecurityContext{
Privileged: &privilegedEnable,
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"NET_ADMIN",
"SYS_ADMIN",
},
},
},
},
},
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets, args string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
SetEnv("CHAOS_UID", string(experimentsDetails.ChaosUID)).
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("TOXIC_COMMAND", args).
SetEnv("NETWORK_INTERFACE", experimentsDetails.NetworkInterface).
SetEnv("TARGET_SERVICE_PORT", strconv.Itoa(experimentsDetails.TargetServicePort)).
SetEnv("PROXY_PORT", strconv.Itoa(experimentsDetails.ProxyPort)).
SetEnv("TOXICITY", strconv.Itoa(experimentsDetails.Toxicity)).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
}
// SetChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -0,0 +1,33 @@
package latency
import (
"context"
"strconv"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
// PodHttpLatencyChaos contains the steps to prepare and inject http latency chaos
func PodHttpLatencyChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHttpLatencyFault")
defer span.End()
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
"Listen Port": experimentsDetails.ProxyPort,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Toxicity": experimentsDetails.Toxicity,
"Latency": experimentsDetails.Latency,
})
args := "-t latency -a latency=" + strconv.Itoa(experimentsDetails.Latency)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -0,0 +1,50 @@
package modifybody
import (
"context"
"fmt"
"math"
"strings"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
// PodHttpModifyBodyChaos contains the steps to prepare and inject http modify body chaos
func PodHttpModifyBodyChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHTTPModifyBodyFault")
defer span.End()
// responseBodyMaxLength defines the max length of response body string to be printed. It is taken as
// the min of length of body and 120 characters to avoid printing large response body.
responseBodyMaxLength := int(math.Min(float64(len(experimentsDetails.ResponseBody)), 120))
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
"Listen Port": experimentsDetails.ProxyPort,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Toxicity": experimentsDetails.Toxicity,
"ResponseBody": experimentsDetails.ResponseBody[0:responseBodyMaxLength],
"Content Type": experimentsDetails.ContentType,
"Content Encoding": experimentsDetails.ContentEncoding,
})
args := fmt.Sprintf(
`-t modify_body -a body="%v" -a content_type=%v -a content_encoding=%v`,
EscapeQuotes(experimentsDetails.ResponseBody), experimentsDetails.ContentType, experimentsDetails.ContentEncoding)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// EscapeQuotes escapes the quotes in the given string
func EscapeQuotes(input string) string {
output := strings.ReplaceAll(input, `\`, `\\`)
output = strings.ReplaceAll(output, `"`, `\"`)
return output
}

View File

@ -0,0 +1,33 @@
package reset
import (
"context"
"strconv"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
// PodHttpResetPeerChaos contains the steps to prepare and inject http reset peer chaos
func PodHttpResetPeerChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHTTPResetPeerFault")
defer span.End()
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
"Listen Port": experimentsDetails.ProxyPort,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Toxicity": experimentsDetails.Toxicity,
"Reset Timeout": experimentsDetails.ResetTimeout,
})
args := "-t reset_peer -a timeout=" + strconv.Itoa(experimentsDetails.ResetTimeout)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -0,0 +1,118 @@
package statuscode
import (
"context"
"fmt"
"math"
"math/rand"
"strconv"
"strings"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"go.opentelemetry.io/otel"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
body "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib/modify-body"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
)
var acceptedStatusCodes = []string{
"200", "201", "202", "204",
"300", "301", "302", "304", "307",
"400", "401", "403", "404",
"500", "501", "502", "503", "504",
}
// PodHttpStatusCodeChaos contains the steps to prepare and inject http status code chaos
func PodHttpStatusCodeChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHttpStatusCodeFault")
defer span.End()
// responseBodyMaxLength defines the max length of response body string to be printed. It is taken as
// the min of length of body and 120 characters to avoid printing large response body.
responseBodyMaxLength := int(math.Min(float64(len(experimentsDetails.ResponseBody)), 120))
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
"Listen Port": experimentsDetails.ProxyPort,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Toxicity": experimentsDetails.Toxicity,
"StatusCode": experimentsDetails.StatusCode,
"ModifyResponseBody": experimentsDetails.ModifyResponseBody,
"ResponseBody": experimentsDetails.ResponseBody[0:responseBodyMaxLength],
"Content Type": experimentsDetails.ContentType,
"Content Encoding": experimentsDetails.ContentEncoding,
})
args := fmt.Sprintf(
`-t status_code -a status_code=%s -a modify_response_body=%d -a response_body="%v" -a content_type=%s -a content_encoding=%s`,
experimentsDetails.StatusCode, stringBoolToInt(experimentsDetails.ModifyResponseBody), body.EscapeQuotes(experimentsDetails.ResponseBody),
experimentsDetails.ContentType, experimentsDetails.ContentEncoding)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// GetStatusCode performs two functions:
// 1. It checks if the status code is provided or not. If it's not then it selects a random status code from supported list
// 2. It checks if the provided status code is valid or not.
func GetStatusCode(statusCode string) (string, error) {
if statusCode == "" {
log.Info("[Info]: No status code provided. Selecting a status code randomly from supported status codes")
return acceptedStatusCodes[rand.Intn(len(acceptedStatusCodes))], nil
}
statusCodeList := strings.Split(statusCode, ",")
rand.Seed(time.Now().Unix())
if len(statusCodeList) == 1 {
if checkStatusCode(statusCodeList[0], acceptedStatusCodes) {
return statusCodeList[0], nil
}
} else {
acceptedCodes := getAcceptedCodesInList(statusCodeList, acceptedStatusCodes)
if len(acceptedCodes) == 0 {
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("invalid status code: %s", statusCode)}
}
return acceptedCodes[rand.Intn(len(acceptedCodes))], nil
}
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("status code '%s' is not supported. Supported status codes are: %v", statusCode, acceptedStatusCodes)}
}
// getAcceptedCodesInList returns the list of accepted status codes from a list of status codes
func getAcceptedCodesInList(statusCodeList []string, acceptedStatusCodes []string) []string {
var acceptedCodes []string
for _, statusCode := range statusCodeList {
if checkStatusCode(statusCode, acceptedStatusCodes) {
acceptedCodes = append(acceptedCodes, statusCode)
}
}
return acceptedCodes
}
// checkStatusCode checks if the provided status code is present in acceptedStatusCode list
func checkStatusCode(statusCode string, acceptedStatusCodes []string) bool {
for _, code := range acceptedStatusCodes {
if code == statusCode {
return true
}
}
return false
}
// stringBoolToInt will convert boolean string to int
func stringBoolToInt(b string) int {
parsedBool, err := strconv.ParseBool(b)
if err != nil {
return 0
}
if parsedBool {
return 1
}
return 0
}

View File

@ -0,0 +1,165 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/load/k6-loadgen/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
corev1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectK6LoadGenFault")
defer span.End()
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
runID := stringutils.GetRunID()
// creating the helper pod to perform k6-loadgen chaos
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// PrepareChaos contains the preparation steps before chaos injection
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareK6LoadGenFault")
defer span.End()
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Starting the k6-loadgen experiment
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not execute chaos")
}
// Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateK6LoadGenFaultHelperPod")
defer span.End()
const volumeName = "script-volume"
const mountPath = "/mnt"
var envs []corev1.EnvVar
args := []string{
mountPath + "/" + experimentsDetails.ScriptSecretKey,
"-q",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--tag",
"trace_id=" + span.SpanContext().TraceID().String(),
}
if otelExporterEndpoint := os.Getenv(telemetry.OTELExporterOTLPEndpoint); otelExporterEndpoint != "" {
envs = []corev1.EnvVar{
{
Name: "K6_OTEL_METRIC_PREFIX",
Value: experimentsDetails.OTELMetricPrefix,
},
{
Name: "K6_OTEL_GRPC_EXPORTER_INSECURE",
Value: "true",
},
{
Name: "K6_OTEL_GRPC_EXPORTER_ENDPOINT",
Value: otelExporterEndpoint,
},
}
args = append(args, "--out", "experimental-opentelemetry")
}
helperPod := &corev1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: corev1.PodSpec{
RestartPolicy: corev1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
Containers: []corev1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: corev1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"k6",
"run",
},
Args: args,
Env: envs,
Resources: chaosDetails.Resources,
VolumeMounts: []corev1.VolumeMount{
{
Name: volumeName,
MountPath: mountPath,
},
},
},
},
Volumes: []corev1.Volume{
{
Name: volumeName,
VolumeSource: corev1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
SecretName: experimentsDetails.ScriptSecretName,
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}

View File

@ -1,26 +1,34 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/workloads"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kafka/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/annotation"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareKafkaPodDeleteFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.ChaoslibDetail.RampTime != 0 {
@ -30,15 +38,15 @@ func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, cli
switch strings.ToLower(experimentsDetails.ChaoslibDetail.Sequence) {
case "serial":
if err := injectChaosInSerialMode(experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return err
if err := injectChaosInSerialMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return err
if err := injectChaosInParallelMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.ChaoslibDetail.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.ChaoslibDetail.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -50,11 +58,12 @@ func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, cli
}
// injectChaosInSerialMode delete the kafka broker pods in serial mode(one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectKafkaPodDeleteFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -67,26 +76,26 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
for duration < experimentsDetails.ChaoslibDetail.ChaosDuration {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.KafkaBroker == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or KAFKA_BROKER")
if experimentsDetails.KafkaBroker == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or KAFKA_BROKER"}
}
targetPodList, err := common.GetPodList(experimentsDetails.KafkaBroker, experimentsDetails.ChaoslibDetail.PodsAffectedPerc, clients, chaosDetails)
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.ChaoslibDetail.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.KafkaBroker, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
// deriving the parent name of the target resources
if chaosDetails.AppDetail.Kind != "" {
for _, pod := range targetPodList.Items {
parentName, err := annotation.GetParentName(clients, pod, chaosDetails)
if err != nil {
return err
}
common.SetParentName(parentName, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target, "targeted", chaosDetails.AppDetail.Kind, chaosDetails)
for _, pod := range targetPodList.Items {
kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return err
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
}
if experimentsDetails.ChaoslibDetail.EngineName != "" {
@ -102,18 +111,18 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"PodName": pod.Name})
if experimentsDetails.ChaoslibDetail.Force {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaoslibDetail.AppNS).Delete(pod.Name, &v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaoslibDetail.AppNS).Delete(pod.Name, &v1.DeleteOptions{})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
}
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
}
switch chaosDetails.Randomness {
case true:
if err := common.RandomInterval(experimentsDetails.ChaoslibDetail.ChaosInterval); err != nil {
return err
return stacktrace.Propagate(err, "could not get random chaos interval")
}
default:
//Waiting for the chaos interval after chaos injection
@ -126,8 +135,15 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.ChaoslibDetail.AppNS, experimentsDetails.ChaoslibDetail.AppLabel, experimentsDetails.ChaoslibDetail.Timeout, experimentsDetails.ChaoslibDetail.Delay, clients); err != nil {
return err
for _, parent := range chaosDetails.ParentsResources {
target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.ChaoslibDetail.Timeout, experimentsDetails.ChaoslibDetail.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
@ -138,11 +154,12 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode delete the kafka broker pods in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectKafkaPodDeleteFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -155,26 +172,25 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for duration < experimentsDetails.ChaoslibDetail.ChaosDuration {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.KafkaBroker == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or KAFKA_BROKER")
if experimentsDetails.KafkaBroker == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or KAFKA_BROKER"}
}
targetPodList, err := common.GetPodList(experimentsDetails.KafkaBroker, experimentsDetails.ChaoslibDetail.PodsAffectedPerc, clients, chaosDetails)
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.ChaoslibDetail.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.KafkaBroker, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
// deriving the parent name of the target resources
if chaosDetails.AppDetail.Kind != "" {
for _, pod := range targetPodList.Items {
parentName, err := annotation.GetParentName(clients, pod, chaosDetails)
if err != nil {
return err
}
common.SetParentName(parentName, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target, "targeted", chaosDetails.AppDetail.Kind, chaosDetails)
for _, pod := range targetPodList.Items {
kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return stacktrace.Propagate(err, "could not get pod owner name and kind")
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
}
if experimentsDetails.ChaoslibDetail.EngineName != "" {
@ -190,19 +206,19 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"PodName": pod.Name})
if experimentsDetails.ChaoslibDetail.Force {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaoslibDetail.AppNS).Delete(pod.Name, &v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaoslibDetail.AppNS).Delete(pod.Name, &v1.DeleteOptions{})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
}
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
}
}
switch chaosDetails.Randomness {
case true:
if err := common.RandomInterval(experimentsDetails.ChaoslibDetail.ChaosInterval); err != nil {
return err
return stacktrace.Propagate(err, "could not get random chaos interval")
}
default:
//Waiting for the chaos interval after chaos injection
@ -215,8 +231,15 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.ChaoslibDetail.AppNS, experimentsDetails.ChaoslibDetail.AppLabel, experimentsDetails.ChaoslibDetail.Timeout, experimentsDetails.ChaoslibDetail.Delay, clients); err != nil {
return err
for _, parent := range chaosDetails.ParentsResources {
target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.ChaoslibDetail.Timeout, experimentsDetails.ChaoslibDetail.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())

View File

@ -1,31 +1,39 @@
package lib
import (
"context"
"fmt"
"strconv"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/kubelet-service-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareKubeletKill contains prepration steps before chaos injection
func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareKubeletKill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareKubeletServiceKillFault")
defer span.End()
var err error
if experimentsDetails.TargetNode == "" {
//Select node for kubelet-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node name")
}
}
@ -33,7 +41,7 @@ func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, c
"NodeName": experimentsDetails.TargetNode,
})
experimentsDetails.RunID = common.GetRunID()
experimentsDetails.RunID = stringutils.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -48,55 +56,34 @@ func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, c
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, clients); err != nil {
return err
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
// Creating the helper pod to perform node memory hog
if err = createHelperPod(experimentsDetails, clients, chaosDetails, experimentsDetails.TargetNode); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err = createHelperPod(ctx, experimentsDetails, clients, chaosDetails, experimentsDetails.TargetNode); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err = status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(experimentsDetails.TargetNode, "targeted", "node", chaosDetails)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return err
}
if err := common.CheckHelperStatusAndRunProbes(ctx, appLabel, experimentsDetails.TargetNode, chaosDetails, clients, resultDetails, eventsDetails); err != nil {
return err
}
// Checking for the node to be in not-ready state
log.Info("[Status]: Check for the node to be in NotReady state")
if err = status.CheckNodeNotReadyState(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("application node is not in NotReady state, err: %v", err)
if deleteErr := common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients); deleteErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[err: %v, delete error: %v]", err, deleteErr)}
}
return stacktrace.Propagate(err, "could not check for NOT READY state")
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.WaitForCompletionAndDeleteHelperPods(appLabel, chaosDetails, clients, false); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
@ -104,11 +91,14 @@ func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, c
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateKubeletServiceKillFaultHelperPod")
defer span.End()
privileged := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
@ -117,7 +107,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, "", experimentsDetails.ExperimentName),
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
@ -189,8 +179,16 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
func ptrint64(p int64) *int64 {

View File

@ -1,6 +1,7 @@
package helper
import (
"context"
"fmt"
"os"
"os/exec"
@ -10,14 +11,18 @@ import (
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
clientTypes "k8s.io/apimachinery/pkg/types"
)
@ -27,12 +32,15 @@ const (
)
var (
err error
inject, abort chan os.Signal
err error
inject, abort chan os.Signal
sPorts, dPorts, whitelistDPorts, whitelistSPorts []string
)
// Helper injects the network chaos
func Helper(clients clients.ClientSets) {
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodNetworkFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
@ -53,10 +61,11 @@ func Helper(clients clients.ClientSets) {
log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails)
// Intialise the chaos attributes
// Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters
// Initialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
// Set the chaos result uid
@ -64,211 +73,304 @@ func Helper(clients clients.ClientSets) {
err := preparePodNetworkChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails)
if err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
//preparePodNetworkChaos contains the prepration steps before chaos injection
// preparePodNetworkChaos contains the prepration steps before chaos injection
func preparePodNetworkChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
containerID, err := getContainerID(experimentsDetails, clients)
if err != nil {
return err
}
// extract out the pid of the target container
targetPID, err := common.GetPID(experimentsDetails.ContainerRuntime, containerID, experimentsDetails.SocketPath)
if err != nil {
return err
targetEnv := os.Getenv("TARGETS")
if targetEnv == "" {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: "no target found, provide atleast one target"}
}
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
var targets []targetDetails
for _, t := range strings.Split(targetEnv, ";") {
target := strings.Split(t, ":")
if len(target) != 4 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: fmt.Sprintf("unsupported target format: '%v'", targets)}
}
td := targetDetails{
Name: target[0],
Namespace: target[1],
TargetContainer: target[2],
DestinationIps: getDestIps(target[3]),
Source: chaosDetails.ChaosPodName,
}
td.ContainerId, err = common.GetRuntimeBasedContainerID(experimentsDetails.ContainerRuntime, experimentsDetails.SocketPath, td.Name, td.Namespace, td.TargetContainer, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the network ns path of the pod sandbox or pause container
td.NetworkNsPath, err = common.GetNetworkNsPath(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container network ns path")
}
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos
go abortWatcher(targetPID, resultDetails.Name, chaosDetails.ChaosNamespace, experimentsDetails.TargetPods)
// injecting network chaos inside target container
if err = injectChaos(experimentsDetails, targetPID); err != nil {
return err
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Stopping the experiment")
// cleaning the netem process after chaos injection
if err = killnetem(targetPID); err != nil {
return err
}
return result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods)
}
//getContainerID extract out the container id of the target container
func getContainerID(experimentDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (string, error) {
var containerID string
switch experimentDetails.ContainerRuntime {
case "docker":
host := "unix://" + experimentDetails.SocketPath
// deriving the container id of the pause container
cmd := "sudo docker --host " + host + " ps | grep k8s_POD_" + experimentDetails.TargetPods + "_" + experimentDetails.AppNS + " | awk '{print $1}'"
out, err := exec.Command("/bin/sh", "-c", cmd).CombinedOutput()
if err != nil {
log.Error(fmt.Sprintf("[docker]: Failed to run docker ps command: %s", string(out)))
return "", err
}
containerID = strings.TrimSpace(string(out))
case "containerd", "crio":
containerID, err = common.GetContainerID(experimentDetails.AppNS, experimentDetails.TargetPods, experimentDetails.TargetContainer, clients)
if err != nil {
return containerID, err
}
default:
return "", errors.Errorf("%v container runtime not suported", experimentDetails.ContainerRuntime)
}
log.Infof("Container ID: %v", containerID)
return containerID, nil
}
// injectChaos inject the network chaos in target container
// it is using nsenter command to enter into network namespace of target container
// and execute the netem command inside it.
func injectChaos(experimentDetails *experimentTypes.ExperimentDetails, pid int) error {
netemCommands := os.Getenv("NETEM_COMMAND")
destinationIPs := os.Getenv("DESTINATION_IPS")
go abortWatcher(targets, experimentsDetails.NetworkInterface, resultDetails.Name, chaosDetails.ChaosNamespace)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
if destinationIPs == "" {
tc := fmt.Sprintf("sudo nsenter -t %d -n tc qdisc replace dev %s root netem %v", pid, experimentDetails.NetworkInterface, netemCommands)
cmd := exec.Command("/bin/bash", "-c", tc)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
}
for index, t := range targets {
// injecting network chaos inside target container
if err = injectChaos(experimentsDetails.NetworkInterface, t); err != nil {
if revertErr := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, index-1); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not inject chaos")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, index); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
if experimentsDetails.EngineName != "" {
msg := "Injected " + experimentsDetails.ExperimentName + " chaos on application pods"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Duration is over, reverting chaos")
if err := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
return nil
}
func revertChaosForAllTargets(targets []targetDetails, networkInterface string, resultDetails *types.ResultDetails, chaosNs string, index int) error {
var errList []string
for i := 0; i <= index; i++ {
killed, err := killnetem(targets[i], networkInterface)
if !killed && err != nil {
errList = append(errList, err.Error())
continue
}
if killed && err == nil {
if err = result.AnnotateChaosResult(resultDetails.Name, chaosNs, "reverted", "pod", targets[i].Name); err != nil {
errList = append(errList, err.Error())
}
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// injectChaos inject the network chaos in target container
// it is using nsenter command to enter into network namespace of target container
// and execute the netem command inside it.
func injectChaos(netInterface string, target targetDetails) error {
netemCommands := os.Getenv("NETEM_COMMAND")
if len(target.DestinationIps) == 0 && len(sPorts) == 0 && len(dPorts) == 0 && len(whitelistDPorts) == 0 && len(whitelistSPorts) == 0 {
tc := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %s root %v", target.NetworkNsPath, netInterface, netemCommands)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create tc rules", target.Source); err != nil {
return err
}
} else {
// Create a priority-based queue
// This instantly creates classes 1:1, 1:2, 1:3
priority := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %v root handle 1: prio", target.NetworkNsPath, netInterface)
log.Info(priority)
if err := common.RunBashCommand(priority, "failed to create priority-based queue", target.Source); err != nil {
return err
}
// Add queueing discipline for 1:3 class.
// No traffic is going through 1:3 yet
traffic := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %v parent 1:3 %v", target.NetworkNsPath, netInterface, netemCommands)
log.Info(traffic)
if err := common.RunBashCommand(traffic, "failed to create netem queueing discipline", target.Source); err != nil {
return err
}
if len(whitelistDPorts) != 0 || len(whitelistSPorts) != 0 {
for _, port := range whitelistDPorts {
//redirect traffic to specific dport through band 2
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 2 u32 match ip dport %v 0xffff flowid 1:2", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create whitelist dport match filters", target.Source); err != nil {
return err
}
}
for _, port := range whitelistSPorts {
//redirect traffic to specific sport through band 2
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 2 u32 match ip sport %v 0xffff flowid 1:2", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create whitelist sport match filters", target.Source); err != nil {
return err
}
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dst 0.0.0.0/0 flowid 1:3", target.NetworkNsPath, netInterface)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create rule for all ports match filters", target.Source); err != nil {
return err
}
} else {
ips := strings.Split(destinationIPs, ",")
var uniqueIps []string
// removing duplicates ips from the list, if any
for i := range ips {
isPresent := false
for j := range uniqueIps {
if ips[i] == uniqueIps[j] {
isPresent = true
}
for i := range target.DestinationIps {
var (
ip = target.DestinationIps[i]
ports []string
isIPV6 = strings.Contains(target.DestinationIps[i], ":")
)
// extracting the destination ports from the ips
// ip format is ip(|port1|port2....|portx)
if strings.Contains(target.DestinationIps[i], "|") {
ip = strings.Split(target.DestinationIps[i], "|")[0]
ports = strings.Split(target.DestinationIps[i], "|")[1:]
}
if !isPresent {
uniqueIps = append(uniqueIps, ips[i])
}
}
// Create a priority-based queue
// This instantly creates classes 1:1, 1:2, 1:3
priority := fmt.Sprintf("sudo nsenter -t %v -n tc qdisc replace dev %v root handle 1: prio", pid, experimentDetails.NetworkInterface)
cmd := exec.Command("/bin/bash", "-c", priority)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
// Add queueing discipline for 1:3 class.
// No traffic is going through 1:3 yet
traffic := fmt.Sprintf("sudo nsenter -t %v -n tc qdisc replace dev %v parent 1:3 netem %v", pid, experimentDetails.NetworkInterface, netemCommands)
cmd = exec.Command("/bin/bash", "-c", traffic)
out, err = cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
for _, ip := range uniqueIps {
// redirect traffic to specific IP through band 3
tc := fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dst %v flowid 1:3", pid, experimentDetails.NetworkInterface, ip)
if strings.Contains(ip, ":") {
tc = fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip6 dst %v flowid 1:3", pid, experimentDetails.NetworkInterface, ip)
filter := fmt.Sprintf("match ip dst %v", ip)
if isIPV6 {
filter = fmt.Sprintf("match ip6 dst %v", ip)
}
cmd = exec.Command("/bin/bash", "-c", tc)
out, err = cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
if len(ports) != 0 {
for _, port := range ports {
portFilter := fmt.Sprintf("%s match ip dport %v 0xffff", filter, port)
if isIPV6 {
portFilter = fmt.Sprintf("%s match ip6 dport %v 0xffff", filter, port)
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 %s flowid 1:3", target.NetworkNsPath, netInterface, portFilter)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ips match filters", target.Source); err != nil {
return err
}
}
continue
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 %s flowid 1:3", target.NetworkNsPath, netInterface, filter)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ips match filters", target.Source); err != nil {
return err
}
}
for _, port := range sPorts {
//redirect traffic to specific sport through band 3
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip sport %v 0xffff flowid 1:3", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create source ports match filters", target.Source); err != nil {
return err
}
}
for _, port := range dPorts {
//redirect traffic to specific dport through band 3
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dport %v 0xffff flowid 1:3", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ports match filters", target.Source); err != nil {
return err
}
}
}
}
log.Infof("chaos injected successfully on {pod: %v, container: %v}", target.Name, target.TargetContainer)
return nil
}
// killnetem kill the netem process for all the target containers
func killnetem(PID int) error {
tc := fmt.Sprintf("sudo nsenter -t %d -n tc qdisc delete dev eth0 root", PID)
func killnetem(target targetDetails, networkInterface string) (bool, error) {
tc := fmt.Sprintf("sudo nsenter --net=%s tc qdisc delete dev %s root", target.NetworkNsPath, networkInterface)
cmd := exec.Command("/bin/bash", "-c", tc)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
log.Info(cmd.String())
// ignoring err if qdisc process doesn't exist inside the target container
if strings.Contains(string(out), qdiscNotFound) || strings.Contains(string(out), qdiscNoFileFound) {
log.Warn("The network chaos process has already been removed")
return nil
return true, err
}
return err
log.Error(err.Error())
return false, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: target.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", target.Name, target.Namespace, target.TargetContainer), Reason: fmt.Sprintf("failed to revert network faults: %s", string(out))}
}
return nil
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", target.Name, target.Namespace, target.TargetContainer)
return true, nil
}
//getENV fetches all the env variables from the runner pod
type targetDetails struct {
Name string
Namespace string
ServiceMesh string
DestinationIps []string
TargetContainer string
ContainerId string
Source string
NetworkNsPath string
}
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.AppLabel = types.Getenv("APP_LABEL", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", ""))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.NetworkInterface = types.Getenv("NETWORK_INTERFACE", "eth0")
experimentDetails.NetworkInterface = types.Getenv("NETWORK_INTERFACE", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
experimentDetails.DestinationIPs = types.Getenv("DESTINATION_IPS", "")
experimentDetails.SourcePorts = types.Getenv("SOURCE_PORTS", "")
experimentDetails.DestinationPorts = types.Getenv("DESTINATION_PORTS", "")
if strings.TrimSpace(experimentDetails.DestinationPorts) != "" {
if strings.Contains(experimentDetails.DestinationPorts, "!") {
whitelistDPorts = strings.Split(strings.TrimPrefix(strings.TrimSpace(experimentDetails.DestinationPorts), "!"), ",")
} else {
dPorts = strings.Split(strings.TrimSpace(experimentDetails.DestinationPorts), ",")
}
}
if strings.TrimSpace(experimentDetails.SourcePorts) != "" {
if strings.Contains(experimentDetails.SourcePorts, "!") {
whitelistSPorts = strings.Split(strings.TrimPrefix(strings.TrimSpace(experimentDetails.SourcePorts), "!"), ",")
} else {
sPorts = strings.Split(strings.TrimSpace(experimentDetails.SourcePorts), ",")
}
}
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targetPID int, resultName, chaosNS, targetPodName string) {
func abortWatcher(targets []targetDetails, networkInterface, resultName, chaosNS string) {
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
@ -276,15 +378,46 @@ func abortWatcher(targetPID int, resultName, chaosNS, targetPodName string) {
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
if err = killnetem(targetPID); err != nil {
log.Errorf("unable to kill netem process, err :%v", err)
for _, t := range targets {
killed, err := killnetem(t, networkInterface)
if err != nil && !killed {
log.Errorf("unable to kill netem process, err :%v", err)
continue
}
if killed && err == nil {
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
}
}
retry--
time.Sleep(1 * time.Second)
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", targetPodName); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
log.Info("Chaos Revert Completed")
os.Exit(1)
}
func getDestIps(serviceMesh string) []string {
var (
destIps = os.Getenv("DESTINATION_IPS")
uniqueIps []string
)
if serviceMesh == "true" {
destIps = os.Getenv("DESTINATION_IPS_SERVICE_MESH")
}
if strings.TrimSpace(destIps) == "" {
return nil
}
ips := strings.Split(strings.TrimSpace(destIps), ",")
// removing duplicates ips from the list, if any
for i := range ips {
if !common.Contains(ips[i], uniqueIps) {
uniqueIps = append(uniqueIps, ips[i])
}
}
return uniqueIps
}

View File

@ -1,17 +1,26 @@
package corruption
import (
"strconv"
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
//PodNetworkCorruptionChaos contains the steps to prepare and inject chaos
func PodNetworkCorruptionChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PodNetworkCorruptionChaos contains the steps to prepare and inject chaos
func PodNetworkCorruptionChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkCorruptionFault")
defer span.End()
args := "corrupt " + strconv.Itoa(experimentsDetails.NetworkPacketCorruptionPercentage)
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
args := "netem corrupt " + experimentsDetails.NetworkPacketCorruptionPercentage
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,17 +1,26 @@
package duplication
import (
"strconv"
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
//PodNetworkDuplicationChaos contains the steps to prepare and inject chaos
func PodNetworkDuplicationChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PodNetworkDuplicationChaos contains the steps to prepare and inject chaos
func PodNetworkDuplicationChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkDuplicationFault")
defer span.End()
args := "duplicate " + strconv.Itoa(experimentsDetails.NetworkPacketDuplicationPercentage)
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
args := "netem duplicate " + experimentsDetails.NetworkPacketDuplicationPercentage
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,17 +1,27 @@
package latency
import (
"context"
"fmt"
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
//PodNetworkLatencyChaos contains the steps to prepare and inject chaos
func PodNetworkLatencyChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PodNetworkLatencyChaos contains the steps to prepare and inject chaos
func PodNetworkLatencyChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkLatencyFault")
defer span.End()
args := "delay " + strconv.Itoa(experimentsDetails.NetworkLatency) + "ms " + strconv.Itoa(experimentsDetails.Jitter) + "ms"
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
args := "netem delay " + strconv.Itoa(experimentsDetails.NetworkLatency) + "ms " + strconv.Itoa(experimentsDetails.Jitter) + "ms"
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,17 +1,26 @@
package loss
import (
"strconv"
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
//PodNetworkLossChaos contains the steps to prepare and inject chaos
func PodNetworkLossChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PodNetworkLossChaos contains the steps to prepare and inject chaos
func PodNetworkLossChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkLossFault")
defer span.End()
args := "loss " + strconv.Itoa(experimentsDetails.NetworkPacketLossPercentage)
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
args := "netem loss " + experimentsDetails.NetworkPacketLossPercentage
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,41 +1,52 @@
package lib
import (
"context"
"fmt"
"net"
"os"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
k8serrors "k8s.io/apimachinery/pkg/api/errors"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareAndInjectChaos contains the prepration & injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error {
var serviceMesh = []string{"istio", "envoy"}
var destIpsSvcMesh string
var destIps string
// PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error {
var err error
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
//set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
logExperimentFields(experimentsDetails)
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -47,53 +58,41 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err)
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
}
experimentsDetails.DestinationIPs, err = GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts)
if err != nil {
return err
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, clients); err != nil {
return err
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode inject the network chaos in all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodNetworkFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -101,38 +100,27 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, args, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
serviceMesh, err := setDestIps(pod, experimentsDetails, clients)
if err != nil {
return stacktrace.Propagate(err, "could not set destination ips")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer, serviceMesh), pod.Spec.NodeName, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting the the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pods, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
@ -140,75 +128,68 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode inject the network chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodNetworkFaultInParallelMode")
defer span.End()
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
targets, err := filterPodsForNodes(targetPodList, experimentsDetails, clients)
if err != nil {
return stacktrace.Propagate(err, "could not filter target pods")
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, args, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
runID := stringutils.GetRunID()
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s:%s", k.Name, k.Namespace, k.TargetContainer, k.ServiceMesh))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pods, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, podName, nodeName, runID, args, labelSuffix string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets string, nodeName, runID, args string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreatePodNetworkFaultHelperPod")
defer span.End()
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
var (
privilegedEnable = true
terminationGracePeriodSeconds = int64(experimentsDetails.TerminationGracePeriodSeconds)
helperName = fmt.Sprintf("%s-helper-%s", experimentsDetails.ExperimentName, stringutils.GetRunID())
)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Name: helperName,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
Tolerations: chaosDetails.Tolerations,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
@ -236,7 +217,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name network-chaos",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, podName, args),
Env: getPodEnv(ctx, experimentsDetails, targets, args),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
@ -257,18 +238,40 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
// mount the network ns path for crio runtime
// it is required to access the sandbox network ns
if strings.ToLower(experimentsDetails.ContainerRuntime) == "crio" {
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, apiv1.Volume{
Name: "netns-path",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/var/run/netns",
},
},
})
helperPod.Spec.Containers[0].VolumeMounts = append(helperPod.Spec.Containers[0].VolumeMounts, apiv1.VolumeMount{
Name: "netns-path",
MountPath: "/var/run/netns",
})
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName, args string) []apiv1.EnvVar {
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string, args string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS).
SetEnv("APP_POD", podName).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
@ -278,21 +281,37 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName, a
SetEnv("NETWORK_INTERFACE", experimentsDetails.NetworkInterface).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("DESTINATION_IPS", experimentsDetails.DestinationIPs).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("DESTINATION_IPS", destIps).
SetEnv("DESTINATION_IPS_SERVICE_MESH", destIpsSvcMesh).
SetEnv("SOURCE_PORTS", experimentsDetails.SourcePorts).
SetEnv("DESTINATION_PORTS", experimentsDetails.DestinationPorts).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
}
// GetTargetIps return the comma separated target ips
// It fetch the ips from the target ips (if defined by users)
// it append the ips from the host, if target host is provided
func GetTargetIps(targetIPs, targetHosts string) (string, error) {
type targetsDetails struct {
Target []target
}
ipsFromHost, err := getIpsForTargetHosts(targetHosts)
type target struct {
Namespace string
Name string
TargetContainer string
ServiceMesh string
}
// GetTargetIps return the comma separated target ips
// It fetches the ips from the target ips (if defined by users)
// it appends the ips from the host, if target host is provided
func GetTargetIps(targetIPs, targetHosts string, clients clients.ClientSets, serviceMesh bool) (string, error) {
ipsFromHost, err := getIpsForTargetHosts(targetHosts, clients, serviceMesh)
if err != nil {
return "", err
return "", stacktrace.Propagate(err, "could not get ips from target hosts")
}
if targetIPs == "" {
targetIPs = ipsFromHost
@ -302,8 +321,51 @@ func GetTargetIps(targetIPs, targetHosts string) (string, error) {
return targetIPs, nil
}
// it derives the pod ips from the kubernetes service
func getPodIPFromService(host string, clients clients.ClientSets) ([]string, error) {
var ips []string
svcFields := strings.Split(host, ".")
if len(svcFields) != 5 {
return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{host: %s}", host), Reason: "provide the valid FQDN for service in '<svc-name>.<namespace>.svc.cluster.local format"}
}
svcName, svcNs := svcFields[0], svcFields[1]
svc, err := clients.GetService(svcNs, svcName)
if err != nil {
if k8serrors.IsForbidden(err) {
log.Warnf("forbidden - failed to get %v service in %v namespace, err: %v", svcName, svcNs, err)
return ips, nil
}
return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{serviceName: %s, namespace: %s}", svcName, svcNs), Reason: err.Error()}
}
if svc.Spec.Selector == nil {
return nil, nil
}
var svcSelector string
for k, v := range svc.Spec.Selector {
if svcSelector == "" {
svcSelector += fmt.Sprintf("%s=%s", k, v)
continue
}
svcSelector += fmt.Sprintf(",%s=%s", k, v)
}
pods, err := clients.ListPods(svcNs, svcSelector)
if err != nil {
return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{svcName: %s,podLabel: %s, namespace: %s}", svcNs, svcSelector, svcNs), Reason: fmt.Sprintf("failed to derive pods from service: %s", err.Error())}
}
for _, p := range pods.Items {
if p.Status.PodIP == "" {
continue
}
ips = append(ips, p.Status.PodIP)
}
return ips, nil
}
// getIpsForTargetHosts resolves IP addresses for comma-separated list of target hosts and returns comma-separated ips
func getIpsForTargetHosts(targetHosts string) (string, error) {
func getIpsForTargetHosts(targetHosts string, clients clients.ClientSets, serviceMesh bool) (string, error) {
if targetHosts == "" {
return "", nil
}
@ -311,12 +373,50 @@ func getIpsForTargetHosts(targetHosts string) (string, error) {
finalHosts := ""
var commaSeparatedIPs []string
for i := range hosts {
ips, err := net.LookupIP(hosts[i])
hosts[i] = strings.TrimSpace(hosts[i])
var (
hostName = hosts[i]
ports []string
)
if strings.Contains(hosts[i], "|") {
host := strings.Split(hosts[i], "|")
hostName = host[0]
ports = host[1:]
log.Infof("host and port: %v :%v", hostName, ports)
}
if strings.Contains(hostName, "svc.cluster.local") && serviceMesh {
ips, err := getPodIPFromService(hostName, clients)
if err != nil {
return "", stacktrace.Propagate(err, "could not get pod ips from service")
}
log.Infof("Host: {%v}, IP address: {%v}", hosts[i], ips)
if ports != nil {
for j := range ips {
commaSeparatedIPs = append(commaSeparatedIPs, ips[j]+"|"+strings.Join(ports, "|"))
}
} else {
commaSeparatedIPs = append(commaSeparatedIPs, ips...)
}
if finalHosts == "" {
finalHosts = hosts[i]
} else {
finalHosts = finalHosts + "," + hosts[i]
}
continue
}
ips, err := net.LookupIP(hostName)
if err != nil {
log.Warnf("Unknown host: {%v}, it won't be included in the scope of chaos", hosts[i])
log.Warnf("Unknown host: {%v}, it won't be included in the scope of chaos", hostName)
} else {
for j := range ips {
log.Infof("Host: {%v}, IP address: {%v}", hosts[i], ips[j])
log.Infof("Host: {%v}, IP address: {%v}", hostName, ips[j])
if ports != nil {
commaSeparatedIPs = append(commaSeparatedIPs, ips[j].String()+"|"+strings.Join(ports, "|"))
continue
}
commaSeparatedIPs = append(commaSeparatedIPs, ips[j].String())
}
if finalHosts == "" {
@ -327,8 +427,121 @@ func getIpsForTargetHosts(targetHosts string) (string, error) {
}
}
if len(commaSeparatedIPs) == 0 {
return "", errors.Errorf("provided hosts: {%v} are invalid, unable to resolve", targetHosts)
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("hosts: %s", targetHosts), Reason: "provided hosts are invalid, unable to resolve"}
}
log.Infof("Injecting chaos on {%v} hosts", finalHosts)
return strings.Join(commaSeparatedIPs, ","), nil
}
// SetChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.NetworkPacketLossPercentage = common.ValidateRange(experimentsDetails.NetworkPacketLossPercentage)
experimentsDetails.NetworkPacketCorruptionPercentage = common.ValidateRange(experimentsDetails.NetworkPacketCorruptionPercentage)
experimentsDetails.NetworkPacketDuplicationPercentage = common.ValidateRange(experimentsDetails.NetworkPacketDuplicationPercentage)
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}
// It checks if pod contains service mesh sidecar
func isServiceMeshEnabledForPod(pod apiv1.Pod) bool {
for _, c := range pod.Spec.Containers {
if common.SubStringExistsInSlice(c.Name, serviceMesh) {
return true
}
}
return false
}
func setDestIps(pod apiv1.Pod, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (string, error) {
var err error
if isServiceMeshEnabledForPod(pod) {
if destIpsSvcMesh == "" {
destIpsSvcMesh, err = GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients, true)
if err != nil {
return "false", err
}
}
return "true", nil
}
if destIps == "" {
destIps, err = GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients, false)
if err != nil {
return "false", err
}
}
return "false", nil
}
func filterPodsForNodes(targetPodList apiv1.PodList, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (map[string]*targetsDetails, error) {
targets := make(map[string]*targetsDetails)
targetContainer := experimentsDetails.TargetContainer
for _, pod := range targetPodList.Items {
serviceMesh, err := setDestIps(pod, experimentsDetails, clients)
if err != nil {
return targets, stacktrace.Propagate(err, "could not set destination ips")
}
if experimentsDetails.TargetContainer == "" {
targetContainer = pod.Spec.Containers[0].Name
}
td := target{
Name: pod.Name,
Namespace: pod.Namespace,
TargetContainer: targetContainer,
ServiceMesh: serviceMesh,
}
if targets[pod.Spec.NodeName] == nil {
targets[pod.Spec.NodeName] = &targetsDetails{
Target: []target{td},
}
} else {
targets[pod.Spec.NodeName].Target = append(targets[pod.Spec.NodeName].Target, td)
}
}
return targets, nil
}
func logExperimentFields(experimentsDetails *experimentTypes.ExperimentDetails) {
switch experimentsDetails.NetworkChaosType {
case "network-loss":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketLossPercentage": experimentsDetails.NetworkPacketLossPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-latency":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkLatency": strconv.Itoa(experimentsDetails.NetworkLatency),
"Jitter": experimentsDetails.Jitter,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-corruption":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketCorruptionPercentage": experimentsDetails.NetworkPacketCorruptionPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-duplication":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketDuplicationPercentage": experimentsDetails.NetworkPacketDuplicationPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-rate-limit":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkBandwidth": experimentsDetails.NetworkBandwidth,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
}
}

View File

@ -0,0 +1,29 @@
package rate
import (
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
// PodNetworkRateChaos contains the steps to prepare and inject chaos
func PodNetworkRateChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkRateLimit")
defer span.End()
args := fmt.Sprintf("tbf rate %s burst %s limit %s", experimentsDetails.NetworkBandwidth, experimentsDetails.Burst, experimentsDetails.Limit)
if experimentsDetails.PeakRate != "" {
args = fmt.Sprintf("%s peakrate %s", args, experimentsDetails.PeakRate)
}
if experimentsDetails.MinBurst != "" {
args = fmt.Sprintf("%s mtu %s", args, experimentsDetails.MinBurst)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,10 +1,17 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-cpu-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -12,14 +19,26 @@ import (
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareNodeCPUHog contains prepration steps before chaos injection
func PrepareNodeCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareNodeCPUHog contains preparation steps before chaos injection
func PrepareNodeCPUHog(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeCPUHogFault")
defer span.End()
//set up the tunables if provided in range
setChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Node CPU Cores": experimentsDetails.NodeCPUcores,
"CPU Load": experimentsDetails.CPULoad,
"Node Affected Percentage": experimentsDetails.NodesAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -28,32 +47,34 @@ func PrepareNodeCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, cl
}
//Select node for node-cpu-hog
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, experimentsDetails.NodesAffectedPerc, clients)
nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc)
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node list")
}
log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{
"No. Of Nodes": len(targetNodeList),
"Node Names": targetNodeList,
})
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, clients); err != nil {
return err
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -65,14 +86,15 @@ func PrepareNodeCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, cl
}
// injectChaosInSerialMode stress the cpu of all the target nodes serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeCPUHogFaultInSerialMode")
defer span.End()
nodeCPUCores := experimentsDetails.NodeCPUcores
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -86,31 +108,31 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// When number of cpu cores for hogging is not defined , it will take it from node capacity
if nodeCPUCores == 0 {
if nodeCPUCores == "0" {
if err := setCPUCapacity(experimentsDetails, appNode, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not get node cpu capacity")
}
}
log.InfoWithValues("[Info]: Details of Node under chaos injection", logrus.Fields{
"NodeName": appNode,
"NodeCPUcores": experimentsDetails.NodeCPUcores,
"NodeCPUCores": experimentsDetails.NodeCPUcores,
})
experimentsDetails.RunID = common.GetRunID()
experimentsDetails.RunID = stringutils.GetRunID()
// Creating the helper pod to perform node cpu hog
if err := createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
common.SetTargets(appNode, "targeted", "node", chaosDetails)
@ -119,32 +141,35 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, false)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
}
}
return nil
}
// injectChaosInParallelMode stress the cpu of all the target nodes in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
nodeCPUCores := experimentsDetails.NodeCPUcores
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeCPUHogFaultInParallelMode")
defer span.End()
labelSuffix := common.GetRunID()
nodeCPUCores := experimentsDetails.NodeCPUcores
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
experimentsDetails.RunID = stringutils.GetRunID()
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
@ -154,9 +179,9 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
}
// When number of cpu cores for hogging is not defined , it will take it from node capacity
if nodeCPUCores == 0 {
if nodeCPUCores == "0" {
if err := setCPUCapacity(experimentsDetails, appNode, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not get node cpu capacity")
}
}
@ -165,68 +190,44 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"NodeCPUcores": experimentsDetails.NodeCPUcores,
})
experimentsDetails.RunID = common.GetRunID()
// Creating the helper pod to perform node cpu hog
if err := createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, appNode := range targetNodeList {
common.SetTargets(appNode, "targeted", "node", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
//setCPUCapacity fetch the node cpu capacity
// setCPUCapacity fetch the node cpu capacity
func setCPUCapacity(experimentsDetails *experimentTypes.ExperimentDetails, appNode string, clients clients.ClientSets) error {
node, err := clients.KubeClient.CoreV1().Nodes().Get(appNode, v1.GetOptions{})
node, err := clients.GetNode(appNode, experimentsDetails.Timeout, experimentsDetails.Delay)
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNode), Reason: err.Error()}
}
cpuCapacity, _ := node.Status.Capacity.Cpu().AsInt64()
experimentsDetails.NodeCPUcores = int(cpuCapacity)
experimentsDetails.NodeCPUcores = node.Status.Capacity.Cpu().String()
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets, labelSuffix string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeCPUHogFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
@ -243,9 +244,9 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
},
Args: []string{
"--cpu",
strconv.Itoa(experimentsDetails.NodeCPUcores),
experimentsDetails.NodeCPUcores,
"--cpu-load",
strconv.Itoa(experimentsDetails.CPULoad),
experimentsDetails.CPULoad,
"--timeout",
strconv.Itoa(experimentsDetails.ChaosDuration),
},
@ -255,6 +256,23 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// setChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.NodeCPUcores = common.ValidateRange(experimentsDetails.NodeCPUcores)
experimentsDetails.CPULoad = common.ValidateRange(experimentsDetails.CPULoad)
experimentsDetails.NodesAffectedPerc = common.ValidateRange(experimentsDetails.NodesAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -1,7 +1,8 @@
package lib
import (
"bytes"
"context"
"fmt"
"os"
"os/exec"
"os/signal"
@ -10,7 +11,12 @@ import (
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-drain/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -19,9 +25,7 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
@ -30,8 +34,10 @@ var (
inject, abort chan os.Signal
)
//PrepareNodeDrain contains the prepration steps before chaos injection
func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareNodeDrain contains the preparation steps before chaos injection
func PrepareNodeDrain(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeDrainFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -53,7 +59,7 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
//Select node for kubelet-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node name")
}
}
@ -65,7 +71,7 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -74,21 +80,33 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
go abortWatcher(experimentsDetails, clients, resultDetails, chaosDetails, eventsDetails)
// Drain the application node
if err := drainNode(experimentsDetails, clients, chaosDetails); err != nil {
return err
if err := drainNode(ctx, experimentsDetails, clients, chaosDetails); err != nil {
log.Info("[Revert]: Reverting chaos because error during draining of node")
if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(uncordonErr).Error())}
}
return stacktrace.Propagate(err, "could not drain node")
}
// Verify the status of AUT after reschedule
log.Info("[Status]: Verify the status of AUT after reschedule")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return errors.Errorf("application status check failed, err: %v", err)
if err = status.AUTStatusCheck(clients, chaosDetails); err != nil {
log.Info("[Revert]: Reverting chaos because application status check failed")
if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(uncordonErr).Error())}
}
return err
}
// Verify the status of Auxiliary Applications after reschedule
if experimentsDetails.AuxiliaryAppInfo != "" {
log.Info("[Status]: Verify that the Auxiliary Applications are running")
if err = status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return errors.Errorf("auxiliary Applications status check failed, err: %v", err)
log.Info("[Revert]: Reverting chaos because auxiliary application status check failed")
if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(uncordonErr).Error())}
}
return err
}
}
@ -100,7 +118,7 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
// Uncordon the application node
if err := uncordonNode(experimentsDetails, clients, chaosDetails); err != nil {
return err
return stacktrace.Propagate(err, "could not uncordon the target node")
}
//Waiting for the ramp time after chaos injection
@ -111,8 +129,10 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
return nil
}
// drainNode drain the application node
func drainNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// drainNode drain the target node
func drainNode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeDrainFault")
defer span.End()
select {
case <-inject:
@ -121,13 +141,9 @@ func drainNode(experimentsDetails *experimentTypes.ExperimentDetails, clients cl
default:
log.Infof("[Inject]: Draining the %v node", experimentsDetails.TargetNode)
command := exec.Command("kubectl", "drain", experimentsDetails.TargetNode, "--ignore-daemonsets", "--delete-local-data", "--force", "--timeout", strconv.Itoa(experimentsDetails.ChaosDuration)+"s")
var out, stderr bytes.Buffer
command.Stdout = &out
command.Stderr = &stderr
if err := command.Run(); err != nil {
log.Infof("Error String: %v", stderr.String())
return errors.Errorf("Unable to drain the %v node, err: %v", experimentsDetails.TargetNode, err)
command := exec.Command("kubectl", "drain", experimentsDetails.TargetNode, "--ignore-daemonsets", "--delete-emptydir-data", "--force", "--timeout", strconv.Itoa(experimentsDetails.ChaosDuration)+"s")
if err := common.RunCLICommands(command, "", fmt.Sprintf("{node: %s}", experimentsDetails.TargetNode), "failed to drain the target node", cerrors.ErrorTypeChaosInject); err != nil {
return err
}
common.SetTargets(experimentsDetails.TargetNode, "injected", "node", chaosDetails)
@ -136,12 +152,12 @@ func drainNode(experimentsDetails *experimentTypes.ExperimentDetails, clients cl
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
nodeSpec, err := clients.KubeClient.CoreV1().Nodes().Get(experimentsDetails.TargetNode, v1.GetOptions{})
nodeSpec, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), experimentsDetails.TargetNode, v1.GetOptions{})
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{node: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
}
if !nodeSpec.Spec.Unschedulable {
return errors.Errorf("%v node is not in unschedulable state", experimentsDetails.TargetNode)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{node: %s}", experimentsDetails.TargetNode), Reason: "node is not in unschedule state"}
}
return nil
})
@ -156,25 +172,21 @@ func uncordonNode(experimentsDetails *experimentTypes.ExperimentDetails, clients
for _, targetNode := range targetNodes {
//Check node exist before uncordon the node
_, err := clients.KubeClient.CoreV1().Nodes().Get(targetNode, metav1.GetOptions{})
_, err := clients.GetNode(targetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil {
if apierrors.IsNotFound(err) {
log.Infof("[Info]: The %v node is no longer exist, skip uncordon the node", targetNode)
common.SetTargets(targetNode, "noLongerExist", "node", chaosDetails)
continue
} else {
return errors.Errorf("unable to get the %v node, err: %v", targetNode, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{node: %s}", targetNode), Reason: err.Error()}
}
}
log.Infof("[Recover]: Uncordon the %v node", targetNode)
command := exec.Command("kubectl", "uncordon", targetNode)
var out, stderr bytes.Buffer
command.Stdout = &out
command.Stderr = &stderr
if err := command.Run(); err != nil {
log.Infof("Error String: %v", stderr.String())
return errors.Errorf("unable to uncordon the %v node, err: %v", targetNode, err)
if err := common.RunCLICommands(command, "", fmt.Sprintf("{node: %s}", targetNode), "failed to uncordon the target node", cerrors.ErrorTypeChaosInject); err != nil {
return err
}
common.SetTargets(targetNode, "reverted", "node", chaosDetails)
}
@ -185,16 +197,16 @@ func uncordonNode(experimentsDetails *experimentTypes.ExperimentDetails, clients
Try(func(attempt uint) error {
targetNodes := strings.Split(experimentsDetails.TargetNode, ",")
for _, targetNode := range targetNodes {
nodeSpec, err := clients.KubeClient.CoreV1().Nodes().Get(targetNode, v1.GetOptions{})
nodeSpec, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), targetNode, v1.GetOptions{})
if err != nil {
if apierrors.IsNotFound(err) {
continue
} else {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{node: %s}", targetNode), Reason: err.Error()}
}
}
if nodeSpec.Spec.Unschedulable {
return errors.Errorf("%v node is in unschedulable state", experimentsDetails.TargetNode)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{node: %s}", targetNode), Reason: "target node is in unschedule state"}
}
}
return nil

View File

@ -1,10 +1,17 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-io-stress/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -12,14 +19,27 @@ import (
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareNodeIOStress contains prepration steps before chaos injection
func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareNodeIOStress contains preparation steps before chaos injection
func PrepareNodeIOStress(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeIOStressFault")
defer span.End()
//set up the tunables if provided in range
setChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The details of chaos tunables are:", logrus.Fields{
"FilesystemUtilizationBytes": experimentsDetails.FilesystemUtilizationBytes,
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"CPU Core": experimentsDetails.CPU,
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
"Node Affected Percentage": experimentsDetails.NodesAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -28,9 +48,10 @@ func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails,
}
//Select node for node-io-stress
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, experimentsDetails.NodesAffectedPerc, clients)
nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc)
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node list")
}
log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{
"No. Of Nodes": len(targetNodeList),
@ -38,22 +59,22 @@ func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails,
})
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, clients); err != nil {
return err
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -65,13 +86,13 @@ func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails,
}
// injectChaosInSerialMode stress the io of all the target nodes serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeIOStressFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -90,52 +111,45 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
})
experimentsDetails.RunID = common.GetRunID()
experimentsDetails.RunID = stringutils.GetRunID()
// Creating the helper pod to perform node io stress
if err := createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(appNode, "injected", "node", chaosDetails)
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
common.SetTargets(appNode, "reverted", "node", chaosDetails)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
common.SetTargets(appNode, "targeted", "node", chaosDetails)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode stress the io of all the target nodes in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeIOStressFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
experimentsDetails.RunID = stringutils.GetRunID()
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
@ -150,57 +164,37 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
})
experimentsDetails.RunID = common.GetRunID()
// Creating the helper pod to perform node io stress
if err := createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
for _, appNode := range targetNodeList {
common.SetTargets(appNode, "injected", "node", chaosDetails)
common.SetTargets(appNode, "targeted", "node", chaosDetails)
}
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
for _, appNode := range targetNodeList {
common.SetTargets(appNode, "reverted", "node", chaosDetails)
}
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets, labelSuffix string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeIOStressFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
@ -222,39 +216,47 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getContainerArguments derives the args for the pumba stress helper pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) []string {
var hddbytes string
if experimentsDetails.FilesystemUtilizationBytes == 0 {
if experimentsDetails.FilesystemUtilizationPercentage == 0 {
if experimentsDetails.FilesystemUtilizationBytes == "0" {
if experimentsDetails.FilesystemUtilizationPercentage == "0" {
hddbytes = "10%"
log.Info("Neither of FilesystemUtilizationPercentage or FilesystemUtilizationBytes provided, proceeding with a default FilesystemUtilizationPercentage value of 10%")
} else {
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationPercentage) + "%"
hddbytes = experimentsDetails.FilesystemUtilizationPercentage + "%"
}
} else {
if experimentsDetails.FilesystemUtilizationPercentage == 0 {
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationBytes) + "G"
if experimentsDetails.FilesystemUtilizationPercentage == "0" {
hddbytes = experimentsDetails.FilesystemUtilizationBytes + "G"
} else {
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationPercentage) + "%"
hddbytes = experimentsDetails.FilesystemUtilizationPercentage + "%"
log.Warn("Both FsUtilPercentage & FsUtilBytes provided as inputs, using the FsUtilPercentage value to proceed with stress exp")
}
}
stressArgs := []string{
"--cpu",
strconv.Itoa(experimentsDetails.CPU),
experimentsDetails.CPU,
"--vm",
strconv.Itoa(experimentsDetails.VMWorkers),
experimentsDetails.VMWorkers,
"--io",
strconv.Itoa(experimentsDetails.NumberOfWorkers),
experimentsDetails.NumberOfWorkers,
"--hdd",
strconv.Itoa(experimentsDetails.NumberOfWorkers),
experimentsDetails.NumberOfWorkers,
"--hdd-bytes",
hddbytes,
"--timeout",
@ -264,3 +266,15 @@ func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails
}
return stressArgs
}
// setChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.FilesystemUtilizationBytes = common.ValidateRange(experimentsDetails.FilesystemUtilizationBytes)
experimentsDetails.FilesystemUtilizationPercentage = common.ValidateRange(experimentsDetails.FilesystemUtilizationPercentage)
experimentsDetails.CPU = common.ValidateRange(experimentsDetails.CPU)
experimentsDetails.VMWorkers = common.ValidateRange(experimentsDetails.VMWorkers)
experimentsDetails.NumberOfWorkers = common.ValidateRange(experimentsDetails.NumberOfWorkers)
experimentsDetails.NodesAffectedPerc = common.ValidateRange(experimentsDetails.NodesAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -1,25 +1,44 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-memory-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareNodeMemoryHog contains prepration steps before chaos injection
func PrepareNodeMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareNodeMemoryHog contains preparation steps before chaos injection
func PrepareNodeMemoryHog(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeMemoryHogFault")
defer span.End()
//set up the tunables if provided in range
setChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The details of chaos tunables are:", logrus.Fields{
"MemoryConsumptionMebibytes": experimentsDetails.MemoryConsumptionMebibytes,
"MemoryConsumptionPercentage": experimentsDetails.MemoryConsumptionPercentage,
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
"Node Affected Percentage": experimentsDetails.NodesAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -28,32 +47,34 @@ func PrepareNodeMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails,
}
//Select node for node-memory-hog
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, experimentsDetails.NodesAffectedPerc, clients)
nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc)
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node list")
}
log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{
"No. Of Nodes": len(targetNodeList),
"Node Names": targetNodeList,
})
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, clients); err != nil {
return err
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -65,13 +86,13 @@ func PrepareNodeMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails,
}
// injectChaosInSerialMode stress the memory of all the target nodes serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeMemoryHogFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -90,68 +111,50 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"Memory Consumption Mebibytes": experimentsDetails.MemoryConsumptionMebibytes,
})
experimentsDetails.RunID = common.GetRunID()
experimentsDetails.RunID = stringutils.GetRunID()
//Getting node memory details
memoryCapacity, memoryAllocatable, err := getNodeMemoryDetails(appNode, clients)
if err != nil {
return errors.Errorf("unable to get the node memory details, err: %v", err)
return stacktrace.Propagate(err, "could not get node memory details")
}
//Getting the exact memory value to exhaust
MemoryConsumption, err := calculateMemoryConsumption(experimentsDetails, clients, memoryCapacity, memoryAllocatable)
MemoryConsumption, err := calculateMemoryConsumption(experimentsDetails, memoryCapacity, memoryAllocatable)
if err != nil {
return errors.Errorf("memory calculation failed, err: %v", err)
return stacktrace.Propagate(err, "could not calculate memory consumption value")
}
// Creating the helper pod to perform node memory hog
if err = createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix, MemoryConsumption); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err = createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients, MemoryConsumption); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
common.SetTargets(appNode, "targeted", "node", chaosDetails)
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
} else if podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod status is %v", podStatus)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode stress the memory all the target nodes in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeMemoryHogFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
experimentsDetails.RunID = stringutils.GetRunID()
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
@ -166,54 +169,32 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"Memory Consumption Mebibytes": experimentsDetails.MemoryConsumptionMebibytes,
})
experimentsDetails.RunID = common.GetRunID()
//Getting node memory details
memoryCapacity, memoryAllocatable, err := getNodeMemoryDetails(appNode, clients)
if err != nil {
return errors.Errorf("unable to get the node memory details, err: %v", err)
return stacktrace.Propagate(err, "could not get node memory details")
}
//Getting the exact memory value to exhaust
MemoryConsumption, err := calculateMemoryConsumption(experimentsDetails, clients, memoryCapacity, memoryAllocatable)
MemoryConsumption, err := calculateMemoryConsumption(experimentsDetails, memoryCapacity, memoryAllocatable)
if err != nil {
return errors.Errorf("memory calculation failed, err: %v", err)
return stacktrace.Propagate(err, "could not calculate memory consumption value")
}
// Creating the helper pod to perform node memory hog
if err = createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix, MemoryConsumption); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err = createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients, MemoryConsumption); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
for _, appNode := range targetNodeList {
common.SetTargets(appNode, "targeted", "node", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
} else if podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod status is %v", podStatus)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
return nil
@ -221,38 +202,36 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
// getNodeMemoryDetails will return the total memory capacity and memory allocatable of an application node
func getNodeMemoryDetails(appNodeName string, clients clients.ClientSets) (int, int, error) {
nodeDetails, err := clients.KubeClient.CoreV1().Nodes().Get(appNodeName, v1.GetOptions{})
nodeDetails, err := clients.GetNode(appNodeName, 180, 2)
if err != nil {
return 0, 0, err
return 0, 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNodeName), Reason: err.Error()}
}
memoryCapacity := int(nodeDetails.Status.Capacity.Memory().Value())
memoryAllocatable := int(nodeDetails.Status.Allocatable.Memory().Value())
if memoryCapacity == 0 || memoryAllocatable == 0 {
return memoryCapacity, memoryAllocatable, errors.Errorf("failed to get memory details of the application node")
return memoryCapacity, memoryAllocatable, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNodeName), Reason: "failed to get memory details of the target node"}
}
return memoryCapacity, memoryAllocatable, nil
}
// calculateMemoryConsumption will calculate the amount of memory to be consumed for a given unit.
func calculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, memoryCapacity, memoryAllocatable int) (string, error) {
func calculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDetails, memoryCapacity, memoryAllocatable int) (string, error) {
var totalMemoryConsumption int
var MemoryConsumption string
var selector string
if experimentsDetails.MemoryConsumptionMebibytes == 0 {
if experimentsDetails.MemoryConsumptionPercentage == 0 {
if experimentsDetails.MemoryConsumptionMebibytes == "0" {
if experimentsDetails.MemoryConsumptionPercentage == "0" {
log.Info("Neither of MemoryConsumptionPercentage or MemoryConsumptionMebibytes provided, proceeding with a default MemoryConsumptionPercentage value of 30%%")
return "30%", nil
}
selector = "percentage"
} else {
if experimentsDetails.MemoryConsumptionPercentage == 0 {
if experimentsDetails.MemoryConsumptionPercentage == "0" {
selector = "mebibytes"
} else {
log.Warn("Both MemoryConsumptionPercentage & MemoryConsumptionMebibytes provided as inputs, using the MemoryConsumptionPercentage value to proceed with the experiment")
@ -265,12 +244,13 @@ func calculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDe
case "percentage":
//Getting the total memory under chaos
memoryForChaos := ((float64(experimentsDetails.MemoryConsumptionPercentage) / 100) * float64(memoryCapacity))
memoryConsumptionPercentage, _ := strconv.ParseFloat(experimentsDetails.MemoryConsumptionPercentage, 64)
memoryForChaos := (memoryConsumptionPercentage / 100) * float64(memoryCapacity)
//Get the percentage of memory under chaos wrt allocatable memory
totalMemoryConsumption = int((float64(memoryForChaos) / float64(memoryAllocatable)) * 100)
totalMemoryConsumption = int((memoryForChaos / float64(memoryAllocatable)) * 100)
if totalMemoryConsumption > 100 {
log.Infof("[Info]: PercentageOfMemoryCapacity To Be Used: %d percent, which is more than 100 percent (%d percent) of Allocatable Memory, so the experiment will only consume upto 100 percent of Allocatable Memory", experimentsDetails.MemoryConsumptionPercentage, totalMemoryConsumption)
log.Infof("[Info]: PercentageOfMemoryCapacity To Be Used: %v percent, which is more than 100 percent (%d percent) of Allocatable Memory, so the experiment will only consume upto 100 percent of Allocatable Memory", experimentsDetails.MemoryConsumptionPercentage, totalMemoryConsumption)
MemoryConsumption = "100%"
} else {
log.Infof("[Info]: PercentageOfMemoryCapacity To Be Used: %v percent, which is %d percent of Allocatable Memory", experimentsDetails.MemoryConsumptionPercentage, totalMemoryConsumption)
@ -282,7 +262,9 @@ func calculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDe
// Bringing all the values in Ki unit to compare
// since 1Mi = 1025.390625Ki
TotalMemoryConsumption := float64(experimentsDetails.MemoryConsumptionMebibytes) * 1025.390625
memoryConsumptionMebibytes, _ := strconv.ParseFloat(experimentsDetails.MemoryConsumptionMebibytes, 64)
TotalMemoryConsumption := memoryConsumptionMebibytes * 1025.390625
// since 1Ki = 1024 bytes
memoryAllocatable := memoryAllocatable / 1024
@ -290,24 +272,26 @@ func calculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDe
MemoryConsumption = strconv.Itoa(memoryAllocatable) + "k"
log.Infof("[Info]: The memory for consumption %vKi is more than the available memory %vKi, so the experiment will hog the memory upto %vKi", int(TotalMemoryConsumption), memoryAllocatable, memoryAllocatable)
} else {
MemoryConsumption = strconv.Itoa(experimentsDetails.MemoryConsumptionMebibytes) + "m"
MemoryConsumption = experimentsDetails.MemoryConsumptionMebibytes + "m"
}
return MemoryConsumption, nil
}
return "", errors.Errorf("please specify the memory consumption value either in percentage or mebibytes in a non-decimal format using respective envs")
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: "specify the memory consumption value either in percentage or mebibytes in a non-decimal format using respective envs"}
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets, labelSuffix, MemoryConsumption string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets, MemoryConsumption string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeMemoryHogFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
@ -324,7 +308,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
},
Args: []string{
"--vm",
strconv.Itoa(experimentsDetails.NumberOfWorkers),
experimentsDetails.NumberOfWorkers,
"--vm-bytes",
MemoryConsumption,
"--timeout",
@ -336,6 +320,24 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// setChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.MemoryConsumptionMebibytes = common.ValidateRange(experimentsDetails.MemoryConsumptionMebibytes)
experimentsDetails.MemoryConsumptionPercentage = common.ValidateRange(experimentsDetails.MemoryConsumptionPercentage)
experimentsDetails.NumberOfWorkers = common.ValidateRange(experimentsDetails.NumberOfWorkers)
experimentsDetails.NodesAffectedPerc = common.ValidateRange(experimentsDetails.NodesAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -1,23 +1,26 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-restart/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
corev1 "k8s.io/kubernetes/pkg/apis/core"
)
var err error
@ -31,17 +34,20 @@ const (
privateKeySecret string = "private-key-cm-"
emptyDirVolume string = "empty-dir-"
ObjectNameField = "metadata.name"
)
// PrepareNodeRestart contains preparation steps before chaos injection
func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareNodeRestart(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeRestartFault")
defer span.End()
//Select the node
if experimentsDetails.TargetNode == "" {
//Select node for node-restart
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node name")
}
}
@ -49,7 +55,7 @@ func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, c
if experimentsDetails.TargetNodeIP == "" {
experimentsDetails.TargetNodeIP, err = getInternalIP(experimentsDetails.TargetNode, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get internal ip")
}
}
@ -58,8 +64,7 @@ func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, c
"Target Node IP": experimentsDetails.TargetNodeIP,
})
experimentsDetails.RunID = common.GetRunID()
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID
experimentsDetails.RunID = stringutils.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -72,45 +77,25 @@ func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, c
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
if err := common.SetHelperData(chaosDetails, clients); err != nil {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
}
}
// Creating the helper pod to perform node restart
if err = createHelperPod(experimentsDetails, chaosDetails, clients); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err = createHelperPod(ctx, experimentsDetails, chaosDetails, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err = status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
if err := common.CheckHelperStatusAndRunProbes(ctx, appLabel, experimentsDetails.TargetNode, chaosDetails, clients, resultDetails, eventsDetails); err != nil {
return err
}
common.SetTargets(experimentsDetails.TargetNode, "targeted", "node", chaosDetails)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return err
}
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.WaitForCompletionAndDeleteHelperPods(appLabel, chaosDetails, clients, false); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
@ -118,14 +103,17 @@ func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, c
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", strconv.Itoa(experimentsDetails.RampTime))
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, clients clients.ClientSets) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, clients clients.ClientSets) error {
// This method is attaching emptyDir along with secret volume, and copy data from secret
// to the emptyDir, because secret is mounted as readonly and with 777 perms and it can't be changed
// because of: https://github.com/kubernetes/kubernetes/issues/57923
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeRestartFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
@ -133,7 +121,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, "", experimentsDetails.ExperimentName),
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
@ -147,7 +135,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
{
MatchFields: []apiv1.NodeSelectorRequirement{
{
Key: corev1.ObjectNameField,
Key: ObjectNameField,
Operator: apiv1.NodeSelectorOpNotIn,
Values: []string{experimentsDetails.TargetNode},
},
@ -198,20 +186,28 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getInternalIP gets the internal ip of the given node
func getInternalIP(nodeName string, clients clients.ClientSets) (string, error) {
node, err := clients.KubeClient.CoreV1().Nodes().Get(nodeName, v1.GetOptions{})
node, err := clients.GetNode(nodeName, 180, 2)
if err != nil {
return "", err
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", nodeName), Reason: err.Error()}
}
for _, addr := range node.Status.Addresses {
if strings.ToLower(string(addr.Type)) == "internalip" {
return addr.Address, nil
}
}
return "", errors.Errorf("unable to find the internal ip of the %v node", nodeName)
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", nodeName), Reason: "failed to get the internal ip of the target node"}
}

View File

@ -1,13 +1,20 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-taint/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -15,9 +22,7 @@ import (
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var (
@ -25,8 +30,10 @@ var (
inject, abort chan os.Signal
)
//PrepareNodeTaint contains the prepration steps before chaos injection
func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareNodeTaint contains the preparation steps before chaos injection
func PrepareNodeTaint(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeTaintFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -48,7 +55,7 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
//Select node for kubelet-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node name")
}
}
@ -60,7 +67,7 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -69,21 +76,28 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
go abortWatcher(experimentsDetails, clients, resultDetails, chaosDetails, eventsDetails)
// taint the application node
if err := taintNode(experimentsDetails, clients, chaosDetails); err != nil {
return err
if err := taintNode(ctx, experimentsDetails, clients, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not taint node")
}
// Verify the status of AUT after reschedule
log.Info("[Status]: Verify the status of AUT after reschedule")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return errors.Errorf("application status check failed, err: %v", err)
if err = status.AUTStatusCheck(clients, chaosDetails); err != nil {
log.Info("[Revert]: Reverting chaos because application status check failed")
if taintErr := removeTaintFromNode(experimentsDetails, clients, chaosDetails); taintErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(taintErr).Error())}
}
return err
}
// Verify the status of Auxiliary Applications after reschedule
if experimentsDetails.AuxiliaryAppInfo != "" {
log.Info("[Status]: Verify that the Auxiliary Applications are running")
if err = status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return errors.Errorf("auxiliary Applications status check failed, err: %v", err)
log.Info("[Revert]: Reverting chaos because auxiliary application status check failed")
if taintErr := removeTaintFromNode(experimentsDetails, clients, chaosDetails); taintErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(taintErr).Error())}
}
return err
}
}
@ -95,7 +109,7 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
// remove taint from the application node
if err := removeTaintFromNode(experimentsDetails, clients, chaosDetails); err != nil {
return err
return stacktrace.Propagate(err, "could not remove taint from node")
}
//Waiting for the ramp time after chaos injection
@ -107,7 +121,9 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
}
// taintNode taint the application node
func taintNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
func taintNode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeTaintFault")
defer span.End()
// get the taint labels & effect
taintKey, taintValue, taintEffect := getTaintDetails(experimentsDetails)
@ -115,9 +131,9 @@ func taintNode(experimentsDetails *experimentTypes.ExperimentDetails, clients cl
log.Infof("Add %v taints to the %v node", taintKey+"="+taintValue+":"+taintEffect, experimentsDetails.TargetNode)
// get the node details
node, err := clients.KubeClient.CoreV1().Nodes().Get(experimentsDetails.TargetNode, v1.GetOptions{})
if err != nil || node == nil {
return errors.Errorf("failed to get %v node, err: %v", experimentsDetails.TargetNode, err)
node, err := clients.GetNode(experimentsDetails.TargetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{nodeName: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
}
// check if the taint already exists
@ -141,9 +157,8 @@ func taintNode(experimentsDetails *experimentTypes.ExperimentDetails, clients cl
Effect: apiv1.TaintEffect(taintEffect),
})
updatedNodeWithTaint, err := clients.KubeClient.CoreV1().Nodes().Update(node)
if err != nil || updatedNodeWithTaint == nil {
return errors.Errorf("failed to update %v node after adding taints, err: %v", experimentsDetails.TargetNode, err)
if err := clients.UpdateNode(chaosDetails, node); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{nodeName: %s}", node.Name), Reason: fmt.Sprintf("failed to add taints: %s", err.Error())}
}
}
@ -162,9 +177,9 @@ func removeTaintFromNode(experimentsDetails *experimentTypes.ExperimentDetails,
taintKey := strings.Split(taintLabel[0], "=")[0]
// get the node details
node, err := clients.KubeClient.CoreV1().Nodes().Get(experimentsDetails.TargetNode, v1.GetOptions{})
if err != nil || node == nil {
return errors.Errorf("failed to get %v node, err: %v", experimentsDetails.TargetNode, err)
node, err := clients.GetNode(experimentsDetails.TargetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{nodeName: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
}
// check if the taint already exists
@ -177,17 +192,16 @@ func removeTaintFromNode(experimentsDetails *experimentTypes.ExperimentDetails,
}
if tainted {
var Newtaints []apiv1.Taint
var newTaints []apiv1.Taint
// remove all the taints with matching key
for _, taint := range node.Spec.Taints {
if taint.Key != taintKey {
Newtaints = append(Newtaints, taint)
newTaints = append(newTaints, taint)
}
}
node.Spec.Taints = Newtaints
updatedNodeWithTaint, err := clients.KubeClient.CoreV1().Nodes().Update(node)
if err != nil || updatedNodeWithTaint == nil {
return errors.Errorf("failed to update %v node after removing taints, err: %v", experimentsDetails.TargetNode, err)
node.Spec.Taints = newTaints
if err := clients.UpdateNode(chaosDetails, node); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{nodeName: %s}", node.Name), Reason: fmt.Sprintf("failed to remove taints: %s", err.Error())}
}
}

View File

@ -1,16 +1,23 @@
package lib
import (
"math"
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-autoscaler/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/math"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
@ -19,8 +26,6 @@ import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
appsv1 "k8s.io/client-go/kubernetes/typed/apps/v1"
retries "k8s.io/client-go/util/retry"
"github.com/pkg/errors"
)
var (
@ -29,8 +34,10 @@ var (
appsv1StatefulsetClient appsv1.StatefulSetInterface
)
//PreparePodAutoscaler contains the prepration steps and chaos injection steps
func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PreparePodAutoscaler contains the preparation steps and chaos injection steps
func PreparePodAutoscaler(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodAutoscalerFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -45,9 +52,9 @@ func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails,
switch strings.ToLower(experimentsDetails.AppKind) {
case "deployment", "deployments":
appsUnderTest, err := getDeploymentDetails(experimentsDetails, clients)
appsUnderTest, err := getDeploymentDetails(experimentsDetails)
if err != nil {
return errors.Errorf("fail to get the name & initial replica count of the deployment, err: %v", err)
return stacktrace.Propagate(err, "could not get deployment details")
}
deploymentList := []string{}
@ -62,22 +69,22 @@ func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails,
//calling go routine which will continuously watch for the abort signal
go abortPodAutoScalerChaos(appsUnderTest, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails)
if err = podAutoscalerChaosInDeployment(experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return errors.Errorf("fail to perform autoscaling, err: %v", err)
if err = podAutoscalerChaosInDeployment(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not scale deployment")
}
if err = autoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
return errors.Errorf("fail to rollback the autoscaling, err: %v", err)
return stacktrace.Propagate(err, "could not revert scaling in deployment")
}
case "statefulset", "statefulsets":
appsUnderTest, err := getStatefulsetDetails(experimentsDetails, clients)
appsUnderTest, err := getStatefulsetDetails(experimentsDetails)
if err != nil {
return errors.Errorf("fail to get the name & initial replica count of the statefulset, err: %v", err)
return stacktrace.Propagate(err, "could not get statefulset details")
}
stsList := []string{}
var stsList []string
for _, sts := range appsUnderTest {
stsList = append(stsList, sts.AppName)
}
@ -89,16 +96,16 @@ func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails,
//calling go routine which will continuously watch for the abort signal
go abortPodAutoScalerChaos(appsUnderTest, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails)
if err = podAutoscalerChaosInStatefulset(experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return errors.Errorf("fail to perform autoscaling, err: %v", err)
if err = podAutoscalerChaosInStatefulset(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not scale statefulset")
}
if err = autoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
return errors.Errorf("fail to rollback the autoscaling, err: %v", err)
return stacktrace.Propagate(err, "could not revert scaling in statefulset")
}
default:
return errors.Errorf("application type '%s' is not supported for the chaos", experimentsDetails.AppKind)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{kind: %s}", experimentsDetails.AppKind), Reason: "application type is not supported"}
}
//Waiting for the ramp time after chaos injection
@ -109,38 +116,38 @@ func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails,
return nil
}
func getSliceOfTotalApplicationsTargeted(appList []experimentTypes.ApplicationUnderTest, experimentsDetails *experimentTypes.ExperimentDetails) ([]experimentTypes.ApplicationUnderTest, error) {
func getSliceOfTotalApplicationsTargeted(appList []experimentTypes.ApplicationUnderTest, experimentsDetails *experimentTypes.ExperimentDetails) []experimentTypes.ApplicationUnderTest {
slice := int(math.Round(float64(len(appList)*experimentsDetails.AppAffectPercentage) / float64(100)))
if slice < 0 || slice > len(appList) {
return nil, errors.Errorf("slice of applications to target out of range %d/%d", slice, len(appList))
}
return appList[:slice], nil
newAppListLength := math.Maximum(1, math.Adjustment(math.Minimum(experimentsDetails.AppAffectPercentage, 100), len(appList)))
return appList[:newAppListLength]
}
//getDeploymentDetails is used to get the name and total number of replicas of the deployment
func getDeploymentDetails(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) ([]experimentTypes.ApplicationUnderTest, error) {
// getDeploymentDetails is used to get the name and total number of replicas of the deployment
func getDeploymentDetails(experimentsDetails *experimentTypes.ExperimentDetails) ([]experimentTypes.ApplicationUnderTest, error) {
deploymentList, err := appsv1DeploymentClient.List(metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
if err != nil || len(deploymentList.Items) == 0 {
return nil, errors.Errorf("fail to get the deployments with matching labels, err: %v", err)
deploymentList, err := appsv1DeploymentClient.List(context.Background(), metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: deployment, labels: %s}", experimentsDetails.AppLabel), Reason: err.Error()}
} else if len(deploymentList.Items) == 0 {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: deployment, labels: %s}", experimentsDetails.AppLabel), Reason: "no deployment found with matching labels"}
}
appsUnderTest := []experimentTypes.ApplicationUnderTest{}
var appsUnderTest []experimentTypes.ApplicationUnderTest
for _, app := range deploymentList.Items {
log.Infof("[Info]: Found deployment name '%s' with replica count '%d'", app.Name, int(*app.Spec.Replicas))
appsUnderTest = append(appsUnderTest, experimentTypes.ApplicationUnderTest{AppName: app.Name, ReplicaCount: int(*app.Spec.Replicas)})
}
// Applying the APP_AFFECT_PERC variable to determine the total target deployments to scale
return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails)
// Applying the APP_AFFECTED_PERC variable to determine the total target deployments to scale
return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails), nil
}
//getStatefulsetDetails is used to get the name and total number of replicas of the statefulsets
func getStatefulsetDetails(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) ([]experimentTypes.ApplicationUnderTest, error) {
// getStatefulsetDetails is used to get the name and total number of replicas of the statefulsets
func getStatefulsetDetails(experimentsDetails *experimentTypes.ExperimentDetails) ([]experimentTypes.ApplicationUnderTest, error) {
statefulsetList, err := appsv1StatefulsetClient.List(metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
if err != nil || len(statefulsetList.Items) == 0 {
return nil, errors.Errorf("fail to get the statefulsets with matching labels, err: %v", err)
statefulsetList, err := appsv1StatefulsetClient.List(context.Background(), metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: statefulset, labels: %s}", experimentsDetails.AppLabel), Reason: err.Error()}
} else if len(statefulsetList.Items) == 0 {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: statefulset, labels: %s}", experimentsDetails.AppLabel), Reason: "no statefulset found with matching labels"}
}
appsUnderTest := []experimentTypes.ApplicationUnderTest{}
@ -149,119 +156,106 @@ func getStatefulsetDetails(experimentsDetails *experimentTypes.ExperimentDetails
appsUnderTest = append(appsUnderTest, experimentTypes.ApplicationUnderTest{AppName: app.Name, ReplicaCount: int(*app.Spec.Replicas)})
}
// Applying the APP_AFFECT_PERC variable to determine the total target deployments to scale
return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails)
return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails), nil
}
//podAutoscalerChaosInDeployment scales up the replicas of deployment and verify the status
func podAutoscalerChaosInDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// podAutoscalerChaosInDeployment scales up the replicas of deployment and verify the status
func podAutoscalerChaosInDeployment(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Scale Application
retryErr := retries.RetryOnConflict(retries.DefaultRetry, func() error {
for _, app := range appsUnderTest {
// Retrieve the latest version of Deployment before attempting update
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
appUnderTest, err := appsv1DeploymentClient.Get(app.AppName, metav1.GetOptions{})
appUnderTest, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("fail to get latest version of application deployment, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: err.Error()}
}
// modifying the replica count
appUnderTest.Spec.Replicas = int32Ptr(int32(experimentsDetails.Replicas))
log.Infof("Updating deployment '%s' to number of replicas '%d'", appUnderTest.ObjectMeta.Name, experimentsDetails.Replicas)
_, err = appsv1DeploymentClient.Update(appUnderTest)
_, err = appsv1DeploymentClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to scale deployment :%s", err.Error())}
}
common.SetTargets(app.AppName, "injected", "deployment", chaosDetails)
}
return nil
})
if retryErr != nil {
return errors.Errorf("fail to update the replica count of the deployment, err: %v", retryErr)
return retryErr
}
log.Info("[Info]: The application started scaling")
if err = deploymentStatusCheck(experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return errors.Errorf("application deployment status check failed, err: %v", err)
}
return nil
return deploymentStatusCheck(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails)
}
//podAutoscalerChaosInStatefulset scales up the replicas of statefulset and verify the status
func podAutoscalerChaosInStatefulset(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// podAutoscalerChaosInStatefulset scales up the replicas of statefulset and verify the status
func podAutoscalerChaosInStatefulset(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Scale Application
retryErr := retries.RetryOnConflict(retries.DefaultRetry, func() error {
for _, app := range appsUnderTest {
// Retrieve the latest version of Statefulset before attempting update
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
appUnderTest, err := appsv1StatefulsetClient.Get(app.AppName, metav1.GetOptions{})
appUnderTest, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("fail to get latest version of the target statefulset application , err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: err.Error()}
}
// modifying the replica count
appUnderTest.Spec.Replicas = int32Ptr(int32(experimentsDetails.Replicas))
_, err = appsv1StatefulsetClient.Update(appUnderTest)
_, err = appsv1StatefulsetClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to scale statefulset :%s", err.Error())}
}
common.SetTargets(app.AppName, "injected", "statefulset", chaosDetails)
}
return nil
})
if retryErr != nil {
return errors.Errorf("fail to update the replica count of the statefulset application, err: %v", retryErr)
return retryErr
}
log.Info("[Info]: The application started scaling")
if err = statefulsetStatusCheck(experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return errors.Errorf("statefulset application status check failed, err: %v", err)
}
return nil
return statefulsetStatusCheck(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails)
}
// deploymentStatusCheck check the status of deployment and verify the available replicas
func deploymentStatusCheck(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func deploymentStatusCheck(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
isFailed := false
err = retry.
Times(uint(experimentsDetails.ChaosDuration / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
for _, app := range appsUnderTest {
deployment, err := appsv1DeploymentClient.Get(app.AppName, metav1.GetOptions{})
deployment, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("fail to find the deployment with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
if int(deployment.Status.ReadyReplicas) != experimentsDetails.Replicas {
isFailed = true
return errors.Errorf("application %s is not scaled yet, the desired replica count is: %v and ready replica count is: %v", app.AppName, experimentsDetails.Replicas, deployment.Status.ReadyReplicas)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to scale deployment, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, deployment.Status.ReadyReplicas)}
}
}
isFailed = false
return nil
})
if isFailed {
if err = autoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
return errors.Errorf("fail to perform the autoscaler recovery of the deployment, err: %v", err)
}
return errors.Errorf("fail to scale the deployment to the desired replica count in the given chaos duration")
}
if err != nil {
return err
if scaleErr := autoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest, chaosDetails); scaleErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(scaleErr).Error())}
}
return stacktrace.Propagate(err, "failed to scale replicas")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
if duration < experimentsDetails.ChaosDuration {
log.Info("[Wait]: Waiting for completion of chaos duration")
@ -272,43 +266,37 @@ func deploymentStatusCheck(experimentsDetails *experimentTypes.ExperimentDetails
}
// statefulsetStatusCheck check the status of statefulset and verify the available replicas
func statefulsetStatusCheck(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func statefulsetStatusCheck(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
isFailed := false
err = retry.
Times(uint(experimentsDetails.ChaosDuration / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
for _, app := range appsUnderTest {
statefulset, err := appsv1StatefulsetClient.Get(app.AppName, metav1.GetOptions{})
statefulset, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("fail to find the statefulset with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
if int(statefulset.Status.ReadyReplicas) != experimentsDetails.Replicas {
isFailed = true
return errors.Errorf("application %s is not scaled yet, the desired replica count is: %v and ready replica count is: %v", app.AppName, experimentsDetails.Replicas, statefulset.Status.ReadyReplicas)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to scale statefulset, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, statefulset.Status.ReadyReplicas)}
}
}
isFailed = false
return nil
})
if isFailed {
if err = autoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
return errors.Errorf("fail to perform the autoscaler recovery of the application, err: %v", err)
}
return errors.Errorf("fail to scale the application to the desired replica count in the given chaos duration")
}
if err != nil {
return err
if scaleErr := autoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest, chaosDetails); scaleErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(scaleErr).Error())}
}
return stacktrace.Propagate(err, "failed to scale replicas")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -322,7 +310,7 @@ func statefulsetStatusCheck(experimentsDetails *experimentTypes.ExperimentDetail
return nil
}
//autoscalerRecoveryInDeployment rollback the replicas to initial values in deployment
// autoscalerRecoveryInDeployment rollback the replicas to initial values in deployment
func autoscalerRecoveryInDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, chaosDetails *types.ChaosDetails) error {
// Scale back to initial number of replicas
@ -330,22 +318,22 @@ func autoscalerRecoveryInDeployment(experimentsDetails *experimentTypes.Experime
// Retrieve the latest version of Deployment before attempting update
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
for _, app := range appsUnderTest {
appUnderTest, err := appsv1DeploymentClient.Get(app.AppName, metav1.GetOptions{})
appUnderTest, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("fail to find the latest version of Application Deployment with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
appUnderTest.Spec.Replicas = int32Ptr(int32(app.ReplicaCount)) // modify replica count
_, err = appsv1DeploymentClient.Update(appUnderTest)
_, err = appsv1DeploymentClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to revert scaling in deployment :%s", err.Error())}
}
common.SetTargets(app.AppName, "reverted", "deployment", chaosDetails)
}
return nil
})
if retryErr != nil {
return errors.Errorf("fail to rollback the deployment, err: %v", retryErr)
return retryErr
}
log.Info("[Info]: Application started rolling back to original replica count")
@ -354,13 +342,13 @@ func autoscalerRecoveryInDeployment(experimentsDetails *experimentTypes.Experime
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
for _, app := range appsUnderTest {
applicationDeploy, err := appsv1DeploymentClient.Get(app.AppName, metav1.GetOptions{})
applicationDeploy, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("fail to find the deployment with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
if int(applicationDeploy.Status.ReadyReplicas) != app.ReplicaCount {
log.Infof("[Info]: Application ready replica count is: %v", applicationDeploy.Status.ReadyReplicas)
return errors.Errorf("fail to rollback to original replica count, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to rollback deployment scaling, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, applicationDeploy.Status.ReadyReplicas)}
}
}
log.Info("[RollBack]: Application rollback to the initial number of replicas")
@ -368,7 +356,7 @@ func autoscalerRecoveryInDeployment(experimentsDetails *experimentTypes.Experime
})
}
//autoscalerRecoveryInStatefulset rollback the replicas to initial values in deployment
// autoscalerRecoveryInStatefulset rollback the replicas to initial values in deployment
func autoscalerRecoveryInStatefulset(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, chaosDetails *types.ChaosDetails) error {
// Scale back to initial number of replicas
@ -376,22 +364,22 @@ func autoscalerRecoveryInStatefulset(experimentsDetails *experimentTypes.Experim
for _, app := range appsUnderTest {
// Retrieve the latest version of Statefulset before attempting update
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
appUnderTest, err := appsv1StatefulsetClient.Get(app.AppName, metav1.GetOptions{})
appUnderTest, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("failed to find the latest version of Statefulset with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
appUnderTest.Spec.Replicas = int32Ptr(int32(app.ReplicaCount)) // modify replica count
_, err = appsv1StatefulsetClient.Update(appUnderTest)
_, err = appsv1StatefulsetClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to revert scaling in statefulset :%s", err.Error())}
}
common.SetTargets(app.AppName, "reverted", "statefulset", chaosDetails)
}
return nil
})
if retryErr != nil {
return errors.Errorf("fail to rollback the statefulset, err: %v", retryErr)
return retryErr
}
log.Info("[Info]: Application pod started rolling back")
@ -400,13 +388,13 @@ func autoscalerRecoveryInStatefulset(experimentsDetails *experimentTypes.Experim
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
for _, app := range appsUnderTest {
applicationDeploy, err := appsv1StatefulsetClient.Get(app.AppName, metav1.GetOptions{})
applicationDeploy, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("fail to get the statefulset with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
if int(applicationDeploy.Status.ReadyReplicas) != app.ReplicaCount {
log.Infof("Application ready replica count is: %v", applicationDeploy.Status.ReadyReplicas)
return errors.Errorf("fail to roll back to original replica count, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to rollback statefulset scaling, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, applicationDeploy.Status.ReadyReplicas)}
}
}
log.Info("[RollBack]: Application roll back to initial number of replicas")
@ -416,7 +404,7 @@ func autoscalerRecoveryInStatefulset(experimentsDetails *experimentTypes.Experim
func int32Ptr(i int32) *int32 { return &i }
//abortPodAutoScalerChaos go routine will continuously watch for the abort signal for the entire chaos duration and generate the required events and result
// abortPodAutoScalerChaos go routine will continuously watch for the abort signal for the entire chaos duration and generate the required events and result
func abortPodAutoScalerChaos(appsUnderTest []experimentTypes.ApplicationUnderTest, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) {
// signChan channel is used to transmit signal notifications.

View File

@ -1,13 +1,20 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-cpu-hog-exec/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -16,36 +23,61 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
)
var inject chan os.Signal
// PrepareCPUExecStress contains the chaos preparation and injection steps
func PrepareCPUExecStress(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodCPUHogExecFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the CPU stress experiment
if err := experimentCPU(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not stress cpu")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// stressCPU Uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the CPU utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
func stressCPU(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets, stressErr chan error) {
// It will contains all the pod & container details required for exec command
func stressCPU(experimentsDetails *experimentTypes.ExperimentDetails, podName, ns string, clients clients.ClientSets, stressErr chan error) {
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", experimentsDetails.ChaosInjectCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, experimentsDetails.AppNS)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, ns)
_, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err
}
//experimentCPU function orchestrates the experiment by calling the StressCPU function for every core, of every container, of every pod that is targeted
func experimentCPU(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// experimentCPU function orchestrates the experiment by calling the StressCPU function for every core, of every container, of every pod that is targeted
func experimentCPU(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
@ -54,36 +86,31 @@ func experimentCPU(experimentsDetails *experimentTypes.ExperimentDetails, client
}
log.Infof("Target pods list for chaos, %v", podNames)
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode stressed the cpu of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodCPUHogExecFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -113,6 +140,11 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
@ -120,7 +152,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
})
for i := 0; i < experimentsDetails.CPUcores; i++ {
go stressCPU(experimentsDetails, pod.Name, clients, stressErr)
go stressCPU(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
}
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
@ -140,18 +172,20 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Warn("Chaos process OOM killed")
return nil
}
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("podName: %s, namespace: %s, container: %s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: fmt.Sprintf("failed to stress cpu of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
err := killStressCPUSerial(experimentsDetails, pod.Name, clients, chaosDetails)
if err != nil {
if err := killStressCPUSerial(experimentsDetails, pod.Name, pod.Namespace, clients, chaosDetails); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
@ -160,8 +194,8 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
break loop
}
}
if err := killStressCPUSerial(experimentsDetails, pod.Name, clients, chaosDetails); err != nil {
return err
if err := killStressCPUSerial(experimentsDetails, pod.Name, pod.Namespace, clients, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not revert cpu stress")
}
}
}
@ -169,13 +203,16 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode stressed the cpu of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodCPUHogExecFaultInParallelMode")
defer span.End()
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -201,6 +238,10 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
@ -208,7 +249,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"CPU CORE": experimentsDetails.CPUcores,
})
for i := 0; i < experimentsDetails.CPUcores; i++ {
go stressCPU(experimentsDetails, pod.Name, clients, stressErr)
go stressCPU(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
}
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
}
@ -229,7 +270,7 @@ loop:
log.Warn("Chaos process OOM killed")
return nil
}
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to stress cpu of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
@ -237,9 +278,12 @@ loop:
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
@ -251,43 +295,19 @@ loop:
return killStressCPUParallel(experimentsDetails, targetPodList, clients, chaosDetails)
}
//PrepareCPUExecStress contains the chaos prepration and injection steps
func PrepareCPUExecStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the CPU stress experiment
if err := experimentCPU(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// killStressCPUSerial function to kill a stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressCPUSerial(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// It will contains all the pod & container details required for exec command
//
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressCPUSerial(experimentsDetails *experimentTypes.ExperimentDetails, podName, ns string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", experimentsDetails.ChaosKillCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, experimentsDetails.AppNS)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, ns)
out, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return errors.Errorf("Unable to kill the stress process in %v pod, err: %v", podName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, ns), Reason: fmt.Sprintf("failed to revert chaos: %s", out)}
}
common.SetTargets(podName, "reverted", "pod", chaosDetails)
return nil
@ -296,12 +316,14 @@ func killStressCPUSerial(experimentsDetails *experimentTypes.ExperimentDetails,
// killStressCPUParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressCPUParallel(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
var errList []string
for _, pod := range targetPodList.Items {
if err := killStressCPUSerial(experimentsDetails, pod.Name, clients, chaosDetails); err != nil {
return err
if err := killStressCPUSerial(experimentsDetails, pod.Name, pod.Namespace, clients, chaosDetails); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}

View File

@ -1,26 +1,33 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-delete/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/annotation"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/workloads"
"github.com/palantir/stacktrace"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PreparePodDelete contains the preparation steps before chaos injection
func PreparePodDelete(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodDeleteFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -28,17 +35,25 @@ func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, cli
common.WaitForDuration(experimentsDetails.RampTime)
}
//set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return err
if err := injectChaosInSerialMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return err
if err := injectChaosInParallelMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -50,11 +65,13 @@ func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, cli
}
// injectChaosInSerialMode delete the target application pods serial mode(one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDeleteFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -67,33 +84,26 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
for duration < experimentsDetails.ChaosDuration {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
// deriving the parent name of the target resources
if chaosDetails.AppDetail.Kind != "" {
for _, pod := range targetPodList.Items {
parentName, err := annotation.GetParentName(clients, pod, chaosDetails)
if err != nil {
return err
}
common.SetParentName(parentName, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target, "targeted", chaosDetails.AppDetail.Kind, chaosDetails)
}
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return stacktrace.Propagate(err, "could not get pod owner name and kind")
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
}
log.Infof("Target pods list: %v", podNames)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
@ -108,18 +118,18 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"PodName": pod.Name})
if experimentsDetails.Force {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(pod.Name, &v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(pod.Name, &v1.DeleteOptions{})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
}
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
}
switch chaosDetails.Randomness {
case true:
if err := common.RandomInterval(experimentsDetails.ChaosInterval); err != nil {
return err
return stacktrace.Propagate(err, "could not get random chaos interval")
}
default:
//Waiting for the chaos interval after chaos injection
@ -132,8 +142,15 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return err
for _, parent := range chaosDetails.ParentsResources {
target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
@ -147,11 +164,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode delete the target application pods in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDeleteFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -164,33 +183,25 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for duration < experimentsDetails.ChaosDuration {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
// deriving the parent name of the target resources
if chaosDetails.AppDetail.Kind != "" {
for _, pod := range targetPodList.Items {
parentName, err := annotation.GetParentName(clients, pod, chaosDetails)
if err != nil {
return err
}
common.SetParentName(parentName, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target, "targeted", chaosDetails.AppDetail.Kind, chaosDetails)
}
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return stacktrace.Propagate(err, "could not get pod owner name and kind")
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
}
log.Infof("Target pods list: %v", podNames)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
@ -205,19 +216,19 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"PodName": pod.Name})
if experimentsDetails.Force {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(pod.Name, &v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(pod.Name, &v1.DeleteOptions{})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
}
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
}
}
switch chaosDetails.Randomness {
case true:
if err := common.RandomInterval(experimentsDetails.ChaosInterval); err != nil {
return err
return stacktrace.Propagate(err, "could not get random chaos interval")
}
default:
//Waiting for the chaos interval after chaos injection
@ -230,8 +241,15 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return err
for _, parent := range chaosDetails.ParentsResources {
target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
@ -240,3 +258,10 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
return nil
}
// SetChaosTunables will setup a random value within a given range of values
// If the value is not provided in range it'll setup the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -1,7 +1,13 @@
package helper
import (
"bytes"
"context"
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"os"
"os/exec"
"os/signal"
@ -17,17 +23,23 @@ import (
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
clientTypes "k8s.io/apimachinery/pkg/types"
)
var (
err error
abort, injectAbort chan os.Signal
err error
)
const (
// ProcessAlreadyKilled contains error code when process is already killed
ProcessAlreadyKilled = "no such process"
)
// Helper injects the dns chaos
func Helper(clients clients.ClientSets) {
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodDNSFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
@ -58,22 +70,70 @@ func Helper(clients clients.ClientSets) {
result.SetResultUID(&resultDetails, clients, &chaosDetails)
if err := preparePodDNSChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
//preparePodDNSChaos contains the preparation steps before chaos injection
// preparePodDNSChaos contains the preparation steps before chaos injection
func preparePodDNSChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
containerID, err := getContainerID(experimentsDetails, clients)
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return err
return stacktrace.Propagate(err, "could not parse targets")
}
// extract out the pid of the target container
pid, err := common.GetPID(experimentsDetails.ContainerRuntime, containerID, experimentsDetails.SocketPath)
if err != nil {
return err
var targets []targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
td.ContainerId, err = common.GetContainerID(td.Namespace, td.Name, td.TargetContainer, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the pid of the target container
td.Pid, err = common.GetPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
}
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos if an abort signal is received
go abortWatcher(targets, resultDetails.Name, chaosDetails.ChaosNamespace)
select {
case <-injectAbort:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
done := make(chan error, 1)
for index, t := range targets {
targets[index].Cmd, err = injectChaos(experimentsDetails, t)
if err != nil {
return stacktrace.Propagate(err, "could not inject chaos")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := terminateProcess(t); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
// record the event inside chaosengine
@ -83,105 +143,136 @@ func preparePodDNSChaos(experimentsDetails *experimentTypes.ExperimentDetails, c
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// prepare dns interceptor
commandTemplate := fmt.Sprintf("sudo TARGET_PID=%d CHAOS_TYPE=%s SPOOF_MAP='%s' TARGET_HOSTNAMES='%s' CHAOS_DURATION=%d MATCH_SCHEME=%s nsutil -p -n -t %d -- dns_interceptor", pid, experimentsDetails.ChaosType, experimentsDetails.SpoofMap, experimentsDetails.TargetHostNames, experimentsDetails.ChaosDuration, experimentsDetails.MatchScheme, pid)
cmd := exec.Command("/bin/bash", "-c", commandTemplate)
log.Info(cmd.String())
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
// injecting dns chaos inside target container
log.Info("[Wait]: Waiting for chaos completion")
// channel to check the completion of the stress process
go func() {
select {
case <-injectAbort:
log.Info("[Chaos]: Abort received, skipping chaos injection")
default:
err = cmd.Run()
if err != nil {
log.Fatalf("dns interceptor failed : %v", err)
var errList []string
for _, t := range targets {
if err := t.Cmd.Wait(); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
log.Errorf("err: %v", strings.Join(errList, ", "))
done <- fmt.Errorf("err: %v", strings.Join(errList, ", "))
}
done <- nil
}()
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
// check the timeout for the command
// Note: timeout will occur when process didn't complete even after 10s of chaos duration
timeout := time.After((time.Duration(experimentsDetails.ChaosDuration) + 30) * time.Second)
timeChan := time.Tick(time.Duration(experimentsDetails.ChaosDuration) * time.Second)
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
// either wait for abort signal or chaos duration
select {
case <-abort:
log.Info("[Chaos]: Killing process started because of terminated signal received")
case <-timeChan:
log.Info("[Chaos]: Stopping the experiment, chaos duration over")
case <-timeout:
// the stress process gets timeout before completion
log.Infof("[Chaos] The stress process is not yet completed after the chaos duration of %vs", experimentsDetails.ChaosDuration+30)
log.Info("[Timeout]: Killing the stress process")
var errList []string
for _, t := range targets {
if err = terminateProcess(t); err != nil {
errList = append(errList, err.Error())
continue
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
case doneErr := <-done:
select {
case <-injectAbort:
// wait for the completion of abort handler
time.Sleep(10 * time.Second)
default:
log.Info("[Info]: Reverting Chaos")
var errList []string
for _, t := range targets {
if err := terminateProcess(t); err != nil {
errList = append(errList, err.Error())
continue
}
if err := result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return doneErr
}
}
log.Info("Chaos Revert Started")
// retry thrice for the chaos revert
return nil
}
func injectChaos(experimentsDetails *experimentTypes.ExperimentDetails, t targetDetails) (*exec.Cmd, error) {
// prepare dns interceptor
var out bytes.Buffer
commandTemplate := fmt.Sprintf("sudo TARGET_PID=%d CHAOS_TYPE=%s SPOOF_MAP='%s' TARGET_HOSTNAMES='%s' CHAOS_DURATION=%d MATCH_SCHEME=%s nsutil -p -n -t %d -- dns_interceptor", t.Pid, experimentsDetails.ChaosType, experimentsDetails.SpoofMap, experimentsDetails.TargetHostNames, experimentsDetails.ChaosDuration, experimentsDetails.MatchScheme, t.Pid)
cmd := exec.Command("/bin/bash", "-c", commandTemplate)
log.Info(cmd.String())
cmd.Stdout = &out
cmd.Stderr = &out
if err = cmd.Start(); err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: experimentsDetails.ChaosPodName, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("faild to inject chaos: %s", out.String())}
}
return cmd, nil
}
func terminateProcess(t targetDetails) error {
// kill command
killTemplate := fmt.Sprintf("sudo kill %d", t.Cmd.Process.Pid)
kill := exec.Command("/bin/bash", "-c", killTemplate)
var out bytes.Buffer
kill.Stderr = &out
kill.Stdout = &out
if err = kill.Run(); err != nil {
if strings.Contains(strings.ToLower(out.String()), ProcessAlreadyKilled) {
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("failed to revert chaos %s", out.String())}
} else {
log.Errorf("dns interceptor process stopped")
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
}
return nil
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targets []targetDetails, resultName, chaosNS string) {
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("[Abort]: Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
if cmd.Process == nil {
log.Infof("cannot kill dns interceptor, process not started. Retrying in 1sec...")
} else {
log.Infof("killing dns interceptor with pid %v", cmd.Process.Pid)
// kill command
killTemplate := fmt.Sprintf("sudo kill %d", cmd.Process.Pid)
kill := exec.Command("/bin/bash", "-c", killTemplate)
if err = kill.Run(); err != nil {
log.Errorf("unable to kill dns interceptor process cry, err :%v", err)
} else {
log.Errorf("dns interceptor process stopped")
break
for _, t := range targets {
if err = terminateProcess(t); err != nil {
log.Errorf("unable to revert for %v pod, err :%v", t.Name, err)
continue
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult for %v pod, err :%v", t.Name, err)
}
}
retry--
time.Sleep(1 * time.Second)
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
log.Info("Chaos Revert Completed")
return nil
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}
//getContainerID extract out the container id of the target container
func getContainerID(experimentDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (string, error) {
var containerID string
switch experimentDetails.ContainerRuntime {
case "docker":
host := "unix://" + experimentDetails.SocketPath
// deriving the container id of the pause container
cmd := "sudo docker --host " + host + " ps | grep k8s_POD_" + experimentDetails.TargetPods + "_" + experimentDetails.AppNS + " | awk '{print $1}'"
out, err := exec.Command("/bin/sh", "-c", cmd).CombinedOutput()
if err != nil {
log.Error(fmt.Sprintf("[docker]: Failed to run docker ps command: %s", string(out)))
return "", err
}
containerID = strings.TrimSpace(string(out))
case "containerd", "crio":
containerID, err = common.GetContainerID(experimentDetails.AppNS, experimentDetails.TargetPods, experimentDetails.TargetContainer, clients)
if err != nil {
return containerID, err
}
default:
return "", errors.Errorf("%v container runtime not suported", experimentDetails.ContainerRuntime)
}
log.Infof("Container ID: %v", containerID)
return containerID, nil
}
//getENV fetches all the env variables from the runner pod
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "60"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
@ -194,3 +285,14 @@ func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosType = types.Getenv("CHAOS_TYPE", "error")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
ContainerId string
Pid int
CommandPid int
Cmd *exec.Cmd
Source string
}

View File

@ -1,33 +1,41 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-dns-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodDNSFault")
defer span.End()
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
@ -46,48 +54,41 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err)
}
}
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, clients); err != nil {
return err
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode inject the DNS Chaos in all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDNSFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -95,38 +96,25 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
// creating the helper pod to perform DNS Chaos
for _, pod := range targetPodList.Items {
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for pod-dns chaos
log.Info("[Cleanup]: Deleting the the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("Unable to delete the helper pods, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
@ -134,70 +122,53 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode inject the DNS Chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDNSFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform DNS Chaos
for _, pod := range targetPodList.Items {
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for pod-dns chaos
log.Info("[Cleanup]: Deleting all the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("Unable to delete the helper pods, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, podName, nodeName, runID, labelSuffix string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreatePodDNSFaultHelperPod")
defer span.End()
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
@ -230,7 +201,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name dns-chaos",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, podName),
Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
@ -245,18 +216,23 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName string) []apiv1.EnvVar {
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS).
SetEnv("APP_POD", podName).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
@ -269,6 +245,8 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName st
SetEnv("MATCH_SCHEME", experimentsDetails.MatchScheme).
SetEnv("CHAOS_TYPE", experimentsDetails.ChaosType).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV

View File

@ -1,6 +1,7 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
@ -8,7 +9,13 @@ import (
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-fio-stress/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -16,15 +23,36 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
)
// PrepareChaos contains the chaos preparation and injection steps
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodFIOStressFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Fio stress experiment
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not inject chaos")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// stressStorage uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the storage utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
func stressStorage(experimentDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets, stressErr chan error) {
func stressStorage(experimentDetails *experimentTypes.ExperimentDetails, podName, ns string, clients clients.ClientSets, stressErr chan error) {
log.Infof("The storage consumption is: %vM", experimentDetails.Size)
@ -37,23 +65,24 @@ func stressStorage(experimentDetails *experimentTypes.ExperimentDetails, podName
log.Infof("Running the command:\n%v", fioCmd)
command := []string{"/bin/sh", "-c", fioCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentDetails.TargetContainer, experimentDetails.AppNS)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentDetails.TargetContainer, ns)
_, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err
}
//experimentExecution function orchestrates the experiment by calling the StressStorage function, of every container, of every pod that is targeted
func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// experimentExecution function orchestrates the experiment by calling the StressStorage function, of every container, of every pod that is targeted
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide either of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
@ -62,38 +91,33 @@ func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails,
}
log.Infof("Target pods list for chaos, %v", podNames)
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode stressed the storage of all target application in serial mode (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodFIOStressFaultInSerialMode")
defer span.End()
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -108,13 +132,17 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"Space Consumption(MB)": experimentsDetails.Size,
})
go stressStorage(experimentsDetails, pod.Name, clients, stressErr)
go stressStorage(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
@ -130,19 +158,25 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if stor to be stressed exceed than the resource limit for the target container
// oom kill occurs if resource to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed")
return nil
}
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("podName: %s, namespace: %s, container: %s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: fmt.Sprintf("failed to stress cpu of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressSerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil {
if err := killStressSerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
@ -151,21 +185,23 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
break loop
}
}
if err := killStressSerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil {
return err
if err := killStressSerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
}
return nil
}
// injectChaosInParallelMode stressed the storage of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodFIOStressFaultInParallelMode")
defer span.End()
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -180,13 +216,17 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"Storage Consumption(MB)": experimentsDetails.Size,
})
go stressStorage(experimentsDetails, pod.Name, clients, stressErr)
go stressStorage(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
}
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
@ -202,19 +242,25 @@ loop:
case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if stor to be stressed exceed than the resource limit for the target container
// oom kill occurs if resource to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed")
return nil
}
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to injcet chaos: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil {
if err := killStressParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
@ -222,58 +268,41 @@ loop:
break loop
}
}
if err := killStressParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil {
return err
if err := killStressParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients); err != nil {
return stacktrace.Propagate(err, "could revert chaos")
}
return nil
}
//PrepareChaos contains the chaos prepration and injection steps
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Fio stress experiment
if err := experimentExecution(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// killStressSerial function to kill a stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
//
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressSerial(containerName, podName, namespace, KillCmd string, clients clients.ClientSets) error {
// It will contains all the pod & container details required for exec command
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", KillCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
out, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return errors.Errorf("Unable to kill stress process inside target container, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, namespace), Reason: fmt.Sprintf("failed to revert chaos: %s", out)}
}
return nil
}
// killStressParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressParallel(containerName string, targetPodList corev1.PodList, namespace, KillCmd string, clients clients.ClientSets) error {
func killStressParallel(containerName string, targetPodList corev1.PodList, KillCmd string, clients clients.ClientSets) error {
var errList []string
for _, pod := range targetPodList.Items {
if err := killStressSerial(containerName, pod.Name, namespace, KillCmd, clients); err != nil {
return err
if err := killStressSerial(containerName, pod.Name, pod.Namespace, KillCmd, clients); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}

View File

@ -1,6 +1,7 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
@ -9,7 +10,12 @@ import (
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-memory-hog-exec/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -18,13 +24,39 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
)
var inject chan os.Signal
// PrepareMemoryExecStress contains the chaos preparation and injection steps
func PrepareMemoryExecStress(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodMemoryHogExecFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Memory stress experiment
if err := experimentMemory(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not stress memory")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// stressMemory Uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the Memory utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
@ -39,22 +71,23 @@ func stressMemory(MemoryConsumption, containerName, podName, namespace string, c
command := []string{"/bin/sh", "-c", ddCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
_, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err
}
//experimentMemory function orchestrates the experiment by calling the StressMemory function, of every container, of every pod that is targeted
func experimentMemory(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// experimentMemory function orchestrates the experiment by calling the StressMemory function, of every container, of every pod that is targeted
func experimentMemory(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
@ -63,36 +96,31 @@ func experimentMemory(experimentsDetails *experimentTypes.ExperimentDetails, cli
}
log.Infof("Target pods list for chaos, %v", podNames)
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode stressed the memory of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodMemoryHogExecFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -122,12 +150,17 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"Memory Consumption(MB)": experimentsDetails.MemoryConsumption,
})
go stressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, clients, stressErr)
go stressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, pod.Namespace, clients, stressErr)
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
@ -146,17 +179,20 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Warn("Chaos process OOM killed")
return nil
}
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("podName: %s, namespace: %s, container: %s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: fmt.Sprintf("failed to stress memory of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
if err := killStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
@ -165,8 +201,8 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
break loop
}
}
if err := killStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
return err
if err := killStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not revert memory stress")
}
}
}
@ -174,13 +210,15 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode stressed the memory of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodMemoryHogExecFaultInParallelMode")
defer span.End()
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -207,13 +245,19 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
//It checks the empty target container for the first iteration only
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"Memory Consumption(MB)": experimentsDetails.MemoryConsumption,
})
go stressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, clients, stressErr)
go stressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, pod.Namespace, clients, stressErr)
}
}
@ -232,13 +276,20 @@ loop:
log.Warn("Chaos process OOM killed")
return nil
}
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to stress memory of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
if err := killStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
// updating the chaosresult after stopped
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
@ -246,36 +297,12 @@ loop:
break loop
}
}
return killStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients, chaosDetails)
}
//PrepareMemoryExecStress contains the chaos prepration and injection steps
func PrepareMemoryExecStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Memory stress experiment
if err := experimentMemory(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
return killStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients, chaosDetails)
}
// killStressMemorySerial function to kill a stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
//
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressMemorySerial(containerName, podName, namespace, memFreeCmd string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// It will contains all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
@ -283,9 +310,9 @@ func killStressMemorySerial(containerName, podName, namespace, memFreeCmd string
command := []string{"/bin/sh", "-c", memFreeCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
out, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return errors.Errorf("Unable to kill stress process inside target container, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, namespace), Reason: fmt.Sprintf("failed to revert chaos: %s", out)}
}
common.SetTargets(podName, "reverted", "pod", chaosDetails)
return nil
@ -293,13 +320,15 @@ func killStressMemorySerial(containerName, podName, namespace, memFreeCmd string
// killStressMemoryParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressMemoryParallel(containerName string, targetPodList corev1.PodList, namespace, memFreeCmd string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
func killStressMemoryParallel(containerName string, targetPodList corev1.PodList, memFreeCmd string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
var errList []string
for _, pod := range targetPodList.Items {
if err := killStressMemorySerial(containerName, pod.Name, namespace, memFreeCmd, clients, chaosDetails); err != nil {
return err
if err := killStressMemorySerial(containerName, pod.Name, pod.Namespace, memFreeCmd, clients, chaosDetails); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}

View File

@ -1,11 +1,14 @@
package lib
import (
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/palantir/stacktrace"
"strings"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-network-partition/types"
"github.com/pkg/errors"
"gopkg.in/yaml.v2"
corev1 "k8s.io/api/core/v1"
networkv1 "k8s.io/api/networking/v1"
@ -51,12 +54,12 @@ func (np *NetworkPolicy) getNetworkPolicyDetails(experimentsDetails *experimentT
// sets the ports for the traffic control
if err := np.setPort(experimentsDetails.PORTS); err != nil {
return err
return stacktrace.Propagate(err, "could not set port")
}
// sets the destination ips for which the traffic should be blocked
if err := np.setExceptIPs(experimentsDetails); err != nil {
return err
return stacktrace.Propagate(err, "could not set ips")
}
// sets the egress traffic rules
@ -137,11 +140,11 @@ func (np *NetworkPolicy) setNamespaceSelector(nsLabel string) *NetworkPolicy {
// setPort sets all the protocols and ports
func (np *NetworkPolicy) setPort(p string) error {
ports := []networkv1.NetworkPolicyPort{}
var ports []networkv1.NetworkPolicyPort
var port Port
// unmarshal the protocols and ports from the env
if err := yaml.Unmarshal([]byte(strings.TrimSpace(parseCommand(p))), &port); err != nil {
return errors.Errorf("Unable to unmarshal, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("failed to unmarshal ports: %s", err.Error())}
}
// sets all the tcp ports
@ -179,9 +182,9 @@ func getPort(port int32, protocol corev1.Protocol) networkv1.NetworkPolicyPort {
// for which traffic should be blocked
func (np *NetworkPolicy) setExceptIPs(experimentsDetails *experimentTypes.ExperimentDetails) error {
// get all the target ips
destinationIPs, err := network_chaos.GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts)
destinationIPs, err := network_chaos.GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients.ClientSets{}, false)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get destination ips")
}
ips := strings.Split(destinationIPs, ",")

View File

@ -1,11 +1,19 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-network-partition/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -14,7 +22,7 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
networkv1 "k8s.io/api/networking/v1"
@ -25,8 +33,10 @@ var (
inject, abort chan os.Signal
)
//PrepareAndInjectChaos contains the prepration & injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareAndInjectChaos contains the prepration & injection steps
func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkPartitionFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -39,13 +49,14 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// validate the appLabels
if chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide the appLabel")
if chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide the appLabel"}
}
// Get the target pod details for the chaos execution
targetPodList, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).List(v1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
targetPodList, err := common.GetPodList("", 100, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
@ -55,7 +66,7 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
log.Infof("Target pods list for chaos, %v", podNames)
// generate a unique string
runID := common.GetRunID()
runID := stringutils.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -66,7 +77,7 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
// collect all the data for the network policy
np := initialize()
if err := np.getNetworkPolicyDetails(experimentsDetails); err != nil {
return err
return stacktrace.Propagate(err, "could not get network policy details")
}
//DISPLAY THE NETWORK POLICY DETAILS
@ -80,11 +91,11 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
})
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, clients, chaosDetails, resultDetails, targetPodList, runID)
go abortWatcher(experimentsDetails, clients, chaosDetails, resultDetails, &targetPodList, runID)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -95,8 +106,8 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
os.Exit(0)
default:
// creating the network policy to block the traffic
if err := createNetworkPolicy(experimentsDetails, clients, np, runID); err != nil {
return err
if err := createNetworkPolicy(ctx, experimentsDetails, clients, np, runID); err != nil {
return stacktrace.Propagate(err, "could not create network policy")
}
// updating chaos status to injected for the target pods
for _, pod := range targetPodList.Items {
@ -105,16 +116,16 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
}
// verify the presence of network policy inside cluster
if err := checkExistanceOfPolicy(experimentsDetails, clients, experimentsDetails.Timeout, experimentsDetails.Delay, runID); err != nil {
return err
if err := checkExistenceOfPolicy(experimentsDetails, clients, experimentsDetails.Timeout, experimentsDetails.Delay, runID); err != nil {
return stacktrace.Propagate(err, "could not check existence of network policy")
}
log.Infof("[Wait]: Wait for %v chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
// deleting the network policy after chaos duration over
if err := deleteNetworkPolicy(experimentsDetails, clients, targetPodList, chaosDetails, experimentsDetails.Timeout, experimentsDetails.Delay, runID); err != nil {
return err
if err := deleteNetworkPolicy(experimentsDetails, clients, &targetPodList, chaosDetails, experimentsDetails.Timeout, experimentsDetails.Delay, runID); err != nil {
return stacktrace.Propagate(err, "could not delete network policy")
}
// updating chaos status to reverted for the target pods
@ -133,7 +144,9 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
// createNetworkPolicy creates the network policy in the application namespace
// it blocks ingress/egress traffic for the targeted application for specific/all IPs
func createNetworkPolicy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, networkPolicy *NetworkPolicy, runID string) error {
func createNetworkPolicy(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, networkPolicy *NetworkPolicy, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodNetworkPartitionFault")
defer span.End()
np := &networkv1.NetworkPolicy{
ObjectMeta: v1.ObjectMeta{
@ -155,25 +168,30 @@ func createNetworkPolicy(experimentsDetails *experimentTypes.ExperimentDetails,
},
}
_, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).Create(np)
return err
_, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).Create(context.Background(), np, v1.CreateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to create network policy: %s", err.Error())}
}
return nil
}
// deleteNetworkPolicy deletes the network policy and wait until the network policy deleted completely
func deleteNetworkPolicy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, targetPodList *corev1.PodList, chaosDetails *types.ChaosDetails, timeout, delay int, runID string) error {
name := experimentsDetails.ExperimentName + "-np-" + runID
labels := "name=" + experimentsDetails.ExperimentName + "-np-" + runID
if err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).Delete(name, &v1.DeleteOptions{}); err != nil {
return err
if err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).Delete(context.Background(), name, v1.DeleteOptions{}); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{name: %s, namespace: %s}", name, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to delete network policy: %s", err.Error())}
}
err := retry.
Times(uint(timeout / delay)).
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
npList, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).List(v1.ListOptions{LabelSelector: labels})
if err != nil || len(npList.Items) != 0 {
return errors.Errorf("Unable to delete the network policy, err: %v", err)
npList, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).List(context.Background(), v1.ListOptions{LabelSelector: labels})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to list network policies: %s", err.Error())}
} else if len(npList.Items) != 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: "network policies are not deleted within timeout"}
}
return nil
})
@ -188,17 +206,19 @@ func deleteNetworkPolicy(experimentsDetails *experimentTypes.ExperimentDetails,
return nil
}
// checkExistanceOfPolicy validate the presence of network policy inside the application namespace
func checkExistanceOfPolicy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, timeout, delay int, runID string) error {
// checkExistenceOfPolicy validate the presence of network policy inside the application namespace
func checkExistenceOfPolicy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, timeout, delay int, runID string) error {
labels := "name=" + experimentsDetails.ExperimentName + "-np-" + runID
return retry.
Times(uint(timeout / delay)).
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
npList, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).List(v1.ListOptions{LabelSelector: labels})
if err != nil || len(npList.Items) == 0 {
return errors.Errorf("no network policy found, err: %v", err)
npList, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).List(context.Background(), v1.ListOptions{LabelSelector: labels})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to list network policies: %s", err.Error())}
} else if len(npList.Items) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: "no network policy found with matching labels"}
}
return nil
})
@ -214,8 +234,13 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
if err := checkExistanceOfPolicy(experimentsDetails, clients, 2, 1, runID); err != nil {
log.Infof("no active network policy found, err: %v", err)
if err := checkExistenceOfPolicy(experimentsDetails, clients, 2, 1, runID); err != nil {
if error, ok := err.(cerrors.Error); ok {
if strings.Contains(error.Reason, "no network policy found with matching labels") {
break
}
}
log.Infof("no active network policy found, err: %v", err.Error())
retry--
continue
}
@ -223,10 +248,12 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients
if err := deleteNetworkPolicy(experimentsDetails, clients, targetPodList, chaosDetails, 2, 1, runID); err != nil {
log.Errorf("unable to delete network policy, err: %v", err)
}
retry--
}
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
log.Info("Chaos Revert Completed")
os.Exit(0)

View File

@ -0,0 +1,260 @@
package lib
import (
"fmt"
"go.opentelemetry.io/otel"
"golang.org/x/net/context"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/rds"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/rds-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
)
var (
err error
inject, abort chan os.Signal
)
func PrepareRDSInstanceStop(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareRDSInstanceStop")
defer span.End()
// Inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// Abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Get the instance identifier or list of instance identifiers
instanceIdentifierList := strings.Split(experimentsDetails.RDSInstanceIdentifier, ",")
if experimentsDetails.RDSInstanceIdentifier == "" || len(instanceIdentifierList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no RDS instance identifier found to stop"}
}
instanceIdentifierList = common.FilterBasedOnPercentage(experimentsDetails.InstanceAffectedPerc, instanceIdentifierList)
log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceIdentifierList))
// Watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, instanceIdentifierList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceIdentifierList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceIdentifierList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the rds instance state in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
select {
case <-inject:
// Stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instance identifier list, %v", instanceIdentifierList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on rds instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i, identifier := range instanceIdentifierList {
// Stopping the RDS instance
log.Info("[Chaos]: Stopping the desired RDS instance")
if err := awslib.RDSInstanceStop(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "injected", "RDS", chaosDetails)
// Wait for rds instance to completely stop
log.Infof("[Wait]: Wait for RDS instance '%v' to get in stopped state", identifier)
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
// Run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
// Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
// Starting the RDS instance
log.Info("[Chaos]: Starting back the RDS instance")
if err = awslib.RDSInstanceStart(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
// Wait for rds instance to get in available state
log.Infof("[Wait]: Wait for RDS instance '%v' to get in available state", identifier)
if err := awslib.WaitForRDSInstanceUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode will inject the rds instance termination in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instance identifier list, %v", instanceIdentifierList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on rds instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// PowerOff the instance
for _, identifier := range instanceIdentifierList {
// Stopping the RDS instance
log.Info("[Chaos]: Stopping the desired RDS instance")
if err := awslib.RDSInstanceStop(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "injected", "RDS", chaosDetails)
}
for _, identifier := range instanceIdentifierList {
// Wait for rds instance to completely stop
log.Infof("[Wait]: Wait for RDS instance '%v' to get in stopped state", identifier)
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
// Run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
// Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
// Starting the RDS instance
for _, identifier := range instanceIdentifierList {
log.Info("[Chaos]: Starting back the RDS instance")
if err = awslib.RDSInstanceStart(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
}
for _, identifier := range instanceIdentifierList {
// Wait for rds instance to get in available state
log.Infof("[Wait]: Wait for RDS instance '%v' to get in available state", identifier)
if err := awslib.WaitForRDSInstanceUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
}
for _, identifier := range instanceIdentifierList {
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// watching for the abort signal and revert the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for _, identifier := range instanceIdentifierList {
instanceState, err := awslib.GetRDSInstanceStatus(identifier, experimentsDetails.Region)
if err != nil {
log.Errorf("Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "running" {
log.Info("[Abort]: Waiting for the RDS instance to get down")
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
log.Errorf("Unable to wait till stop of the instance: %v", err)
}
log.Info("[Abort]: Starting RDS instance as abort signal received")
err := awslib.RDSInstanceStart(identifier, experimentsDetails.Region)
if err != nil {
log.Errorf("RDS instance failed to start when an abort signal is received: %v", err)
}
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -1,31 +1,38 @@
package lib
import (
"context"
"fmt"
"time"
redfishLib "github.com/litmuschaos/litmus-go/pkg/baremetal/redfish"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/baremetal/redfish-node-restart/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
//injectChaos initiates node restart chaos on the target node
func injectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
// injectChaos initiates node restart chaos on the target node
func injectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectRedfishNodeRestartFault")
defer span.End()
URL := fmt.Sprintf("https://%v/redfish/v1/Systems/System.Embedded.1/Actions/ComputerSystem.Reset", experimentsDetails.IPMIIP)
return redfishLib.RebootNode(URL, experimentsDetails.User, experimentsDetails.Password)
}
//experimentExecution function orchestrates the experiment by calling the injectChaos function
func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// experimentExecution function orchestrates the experiment by calling the injectChaos function
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -36,17 +43,19 @@ func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails,
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if err := injectChaos(experimentsDetails, clients); err != nil {
return err
if err := injectChaos(ctx, experimentsDetails, clients); err != nil {
return stacktrace.Propagate(err, "chaos injection failed")
}
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
log.Infof("[Chaos]: Waiting for: %vs", experimentsDetails.ChaosDuration)
time.Sleep(time.Duration(experimentsDetails.ChaosDuration) * time.Second)
return nil
}
//PrepareChaos contains the chaos prepration and injection steps
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareChaos contains the chaos prepration and injection steps
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareRedfishNodeRestartFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -54,7 +63,7 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Redfish node restart experiment
if err := experimentExecution(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
common.SetTargets(experimentsDetails.IPMIIP, "targeted", "node", chaosDetails)

View File

@ -0,0 +1,403 @@
package lib
import (
"bytes"
"context"
"encoding/json"
"fmt"
"net/http"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
corev1 "k8s.io/api/core/v1"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/spring-boot/spring-boot-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/sirupsen/logrus"
)
var revertAssault = experimentTypes.ChaosMonkeyAssaultRevert{
LatencyActive: false,
KillApplicationActive: false,
CPUActive: false,
MemoryActive: false,
ExceptionsActive: false,
}
// SetTargetPodList selects the targeted pod and add them to the experimentDetails
func SetTargetPodList(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
var err error
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or TARGET_PODS"}
}
if experimentsDetails.TargetPodList, err = common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails); err != nil {
return err
}
return nil
}
// PrepareChaos contains the preparation steps before chaos injection
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareSpringBootFault")
defer span.End()
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
log.InfoWithValues("[Info]: Chaos monkeys watchers will be injected to the target pods as follows", logrus.Fields{
"WebClient": experimentsDetails.ChaosMonkeyWatchers.WebClient,
"Service": experimentsDetails.ChaosMonkeyWatchers.Service,
"Component": experimentsDetails.ChaosMonkeyWatchers.Component,
"Repository": experimentsDetails.ChaosMonkeyWatchers.Repository,
"Controller": experimentsDetails.ChaosMonkeyWatchers.Controller,
"RestController": experimentsDetails.ChaosMonkeyWatchers.RestController,
})
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// CheckChaosMonkey verifies if chaos monkey for spring boot is available in the selected pods
// All pods are checked, even if some errors occur. But in case of one pod in error, the check will be in error
func CheckChaosMonkey(chaosMonkeyPort string, chaosMonkeyPath string, targetPods corev1.PodList) (bool, error) {
hasErrors := false
targetPodNames := []string{}
for _, pod := range targetPods.Items {
targetPodNames = append(targetPodNames, pod.Name)
endpoint := "http://" + pod.Status.PodIP + ":" + chaosMonkeyPort + chaosMonkeyPath
log.Infof("[Check]: Checking pod: %v (endpoint: %v)", pod.Name, endpoint)
resp, err := http.Get(endpoint)
if err != nil {
log.Errorf("failed to request chaos monkey endpoint on pod %s, %s", pod.Name, err.Error())
hasErrors = true
continue
}
if resp.StatusCode != 200 {
log.Errorf("failed to get chaos monkey endpoint on pod %s (status: %d)", pod.Name, resp.StatusCode)
hasErrors = true
}
}
if hasErrors {
return false, cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{podNames: %s}", targetPodNames), Reason: "failed to check chaos monkey on at least one pod, check logs for details"}
}
return true, nil
}
// enableChaosMonkey enables chaos monkey on selected pods
func enableChaosMonkey(chaosMonkeyPort string, chaosMonkeyPath string, pod corev1.Pod) error {
log.Infof("[Chaos]: Enabling Chaos Monkey on pod: %v", pod.Name)
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/enable", "", nil) //nolint:bodyclose
if err != nil {
return err
}
if resp.StatusCode != 200 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to enable chaos monkey endpoint (status: %d)", resp.StatusCode)}
}
return nil
}
func setChaosMonkeyWatchers(chaosMonkeyPort string, chaosMonkeyPath string, watchers experimentTypes.ChaosMonkeyWatchers, pod corev1.Pod) error {
log.Infof("[Chaos]: Setting Chaos Monkey watchers on pod: %v", pod.Name)
jsonValue, err := json.Marshal(watchers)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to marshal chaos monkey watchers, %s", err.Error())}
}
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/watchers", "application/json", bytes.NewBuffer(jsonValue))
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to set watchers, %s", err.Error())}
}
if resp.StatusCode != 200 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to set assault (status: %d)", resp.StatusCode)}
}
return nil
}
func startAssault(chaosMonkeyPort string, chaosMonkeyPath string, assault []byte, pod corev1.Pod) error {
if err := setChaosMonkeyAssault(chaosMonkeyPort, chaosMonkeyPath, assault, pod); err != nil {
return err
}
log.Infof("[Chaos]: Activating Chaos Monkey assault on pod: %v", pod.Name)
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/assaults/runtime/attack", "", nil)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to start assault %s", err.Error())}
}
if resp.StatusCode != 200 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to activate runtime attack (status: %d)", resp.StatusCode)}
}
return nil
}
func setChaosMonkeyAssault(chaosMonkeyPort string, chaosMonkeyPath string, assault []byte, pod corev1.Pod) error {
log.Infof("[Chaos]: Setting Chaos Monkey assault on pod: %v", pod.Name)
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/assaults", "application/json", bytes.NewBuffer(assault))
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to set assault, %s", err.Error())}
}
if resp.StatusCode != 200 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to set assault (status: %d)", resp.StatusCode)}
}
return nil
}
// disableChaosMonkey disables chaos monkey on selected pods
func disableChaosMonkey(ctx context.Context, chaosMonkeyPort string, chaosMonkeyPath string, pod corev1.Pod) error {
log.Infof("[Chaos]: disabling assaults on pod %s", pod.Name)
jsonValue, err := json.Marshal(revertAssault)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to marshal chaos monkey revert-chaos watchers, %s", err.Error())}
}
if err := setChaosMonkeyAssault(chaosMonkeyPort, chaosMonkeyPath, jsonValue, pod); err != nil {
return err
}
log.Infof("[Chaos]: disabling chaos monkey on pod %s", pod.Name)
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/disable", "", nil)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to disable assault, %s", err.Error())}
}
if resp.StatusCode != 200 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to disable chaos monkey endpoint (status: %d)", resp.StatusCode)}
}
return nil
}
// injectChaosInSerialMode injects chaos monkey assault on pods in serial mode(one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectSpringBootFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
select {
case <-signChan:
// stopping the chaos execution, if abort signal received
time.Sleep(10 * time.Second)
os.Exit(0)
default:
for _, pod := range experimentsDetails.TargetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
_ = events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Chaos]: Injecting on target pod", logrus.Fields{
"Target Pod": pod.Name,
})
if err := setChaosMonkeyWatchers(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyWatchers, pod); err != nil {
log.Errorf("[Chaos]: Failed to set watchers, err: %v ", err)
return err
}
if err := startAssault(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyAssault, pod); err != nil {
log.Errorf("[Chaos]: Failed to set assault, err: %v ", err)
return err
}
if err := enableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("[Chaos]: Failed to enable chaos, err: %v ", err)
return err
}
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
log.Infof("[Chaos]: Waiting for: %vs", experimentsDetails.ChaosDuration)
endTime = time.After(timeDelay)
loop:
for {
select {
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("Error in disabling chaos monkey, err: %v", err)
} else {
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, cerrors.ErrorTypeExperimentAborted)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
return err
}
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
}
return nil
}
// injectChaosInParallelMode injects chaos monkey assault on pods in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectSpringBootFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
select {
case <-signChan:
// stopping the chaos execution, if abort signal received
time.Sleep(10 * time.Second)
os.Exit(0)
default:
for _, pod := range experimentsDetails.TargetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
_ = events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Pod": pod.Name,
})
if err := setChaosMonkeyWatchers(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyWatchers, pod); err != nil {
log.Errorf("[Chaos]: Failed to set watchers, err: %v", err)
return err
}
if err := startAssault(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyAssault, pod); err != nil {
log.Errorf("[Chaos]: Failed to set assault, err: %v", err)
return err
}
if err := enableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("[Chaos]: Failed to enable chaos, err: %v", err)
return err
}
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
}
log.Infof("[Chaos]: Waiting for: %vs", experimentsDetails.ChaosDuration)
}
loop:
for {
endTime = time.After(timeDelay)
select {
case <-signChan:
log.Info("[Chaos]: Revert Started")
for _, pod := range experimentsDetails.TargetPodList.Items {
if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("Error in disabling chaos monkey, err: %v", err)
} else {
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
}
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, cerrors.ErrorTypeExperimentAborted)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
var errorList []string
for _, pod := range experimentsDetails.TargetPodList.Items {
if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
errorList = append(errorList, err.Error())
continue
}
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
if len(errorList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("error in disabling chaos monkey, [%s]", strings.Join(errorList, ","))}
}
return nil
}

View File

@ -3,7 +3,7 @@ package helper
import (
"bufio"
"bytes"
"encoding/json"
"context"
"fmt"
"io"
"os"
@ -16,19 +16,27 @@ import (
"time"
"github.com/containerd/cgroups"
cgroupsv2 "github.com/containerd/cgroups/v2"
"github.com/palantir/stacktrace"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientTypes "k8s.io/apimachinery/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
clientTypes "k8s.io/apimachinery/pkg/types"
)
//list of cgroups in a container
// list of cgroups in a container
var (
cgroupSubsystemList = []string{"cpu", "memory", "systemd", "net_cls",
"net_prio", "freezer", "blkio", "perf_event", "devices", "cpuset",
@ -44,10 +52,14 @@ var (
const (
// ProcessAlreadyFinished contains error code when process is finished
ProcessAlreadyFinished = "os: process already finished"
// ProcessAlreadyKilled contains error code when process is already killed
ProcessAlreadyKilled = "no such process"
)
// Helper injects the stress chaos
func Helper(clients clients.ClientSets) {
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodStressFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
@ -70,6 +82,7 @@ func Helper(clients clients.ClientSets) {
// Intialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
@ -78,150 +91,260 @@ func Helper(clients clients.ClientSets) {
result.SetResultUID(&resultDetails, clients, &chaosDetails)
if err := prepareStressChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
//prepareStressChaos contains the chaos preparation and injection steps
// prepareStressChaos contains the chaos preparation and injection steps
func prepareStressChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
// get stressors in list format
stressorList := prepareStressor(experimentsDetails)
if len(stressorList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: "fail to prepare stressors"}
}
stressors := strings.Join(stressorList, " ")
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not parse targets")
}
var targets []*targetDetails
for _, t := range targetList.Target {
td := &targetDetails{
Name: t.Name,
Namespace: t.Namespace,
Source: chaosDetails.ChaosPodName,
}
td.TargetContainers, err = common.GetTargetContainers(t.Name, t.Namespace, t.TargetContainer, chaosDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get target containers")
}
td.ContainerIds, err = common.GetContainerIDs(td.Namespace, td.Name, td.TargetContainers, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container ids")
}
for _, cid := range td.ContainerIds {
// extract out the pid of the target container
pid, err := common.GetPID(experimentsDetails.ContainerRuntime, cid, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
}
td.Pids = append(td.Pids, pid)
}
for i := range td.Pids {
cGroupManagers, err, grpPath := getCGroupManager(td, i)
if err != nil {
return stacktrace.Propagate(err, "could not get cgroup manager")
}
td.GroupPath = grpPath
td.CGroupManagers = append(td.CGroupManagers, cGroupManagers)
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": td.Name,
"Namespace": td.Namespace,
"TargetContainers": td.TargetContainers,
})
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos if an abort signal is received
go abortWatcher(targets, resultDetails.Name, chaosDetails.ChaosNamespace)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
containerID, err := common.GetContainerID(experimentsDetails.AppNS, experimentsDetails.TargetPods, experimentsDetails.TargetContainer, clients)
if err != nil {
return err
}
// extract out the pid of the target container
targetPID, err := getPID(experimentsDetails, containerID)
if err != nil {
return err
}
done := make(chan error, 1)
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//get the pid path and check cgroup
path := pidPath(targetPID)
cgroup, err := findValidCgroup(path, containerID)
if err != nil {
return errors.Errorf("fail to get cgroup, err: %v", err)
}
// load the existing cgroup
control, err := cgroups.Load(cgroups.V1, cgroups.StaticPath(cgroup))
if err != nil {
return errors.Errorf("fail to load the cgroup, err: %v", err)
}
// get stressors in list format
stressorList := prepareStressor(experimentsDetails)
if len(stressorList) == 0 {
return errors.Errorf("fail to prepare stressor for %v experiment", experimentsDetails.ExperimentName)
}
stressors := strings.Join(stressorList, " ")
stressCommand := "pause nsutil -t " + strconv.Itoa(targetPID) + " -p -- " + stressors
log.Infof("[Info]: starting process: %v", stressCommand)
// launch the stress-ng process on the target container in paused mode
cmd := exec.Command("/bin/bash", "-c", stressCommand)
var buf bytes.Buffer
cmd.Stdout = &buf
err = cmd.Start()
if err != nil {
return errors.Errorf("fail to start the stress process %v, err: %v", stressCommand, err)
}
// watching for the abort signal and revert the chaos if an abort signal is received
go abortWatcher(cmd.Process.Pid, resultDetails.Name, chaosDetails.ChaosNamespace, experimentsDetails.TargetPods)
// add the stress process to the cgroup of target container
if err = control.Add(cgroups.Process{Pid: cmd.Process.Pid}); err != nil {
if killErr := cmd.Process.Kill(); killErr != nil {
return errors.Errorf("stressors failed killing %v process, err: %v", cmd.Process.Pid, killErr)
}
return errors.Errorf("fail to add the stress process into target container cgroup, err: %v", err)
}
log.Info("[Info]: Sending signal to resume the stress process")
// wait for the process to start before sending the resume signal
// TODO: need a dynamic way to check the start of the process
time.Sleep(700 * time.Millisecond)
// remove pause and resume or start the stress process
if err := cmd.Process.Signal(syscall.SIGCONT); err != nil {
return errors.Errorf("fail to remove pause and start the stress process: %v", err)
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
log.Info("[Wait]: Waiting for chaos completion")
// channel to check the completion of the stress process
done := make(chan error)
go func() { done <- cmd.Wait() }()
// check the timeout for the command
// Note: timeout will occur when process didn't complete even after 10s of chaos duration
timeout := time.After((time.Duration(experimentsDetails.ChaosDuration) + 30) * time.Second)
select {
case <-timeout:
// the stress process gets timeout before completion
log.Infof("[Chaos] The stress process is not yet completed after the chaos duration of %vs", experimentsDetails.ChaosDuration+30)
log.Info("[Timeout]: Killing the stress process")
if err = terminateProcess(cmd.Process.Pid); err != nil {
return err
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
return nil
case err := <-done:
for index, t := range targets {
for i := range t.Pids {
cmd, err := injectChaos(t, stressors, i, experimentsDetails.StressType)
if err != nil {
err, ok := err.(*exec.ExitError)
if ok {
status := err.Sys().(syscall.WaitStatus)
if status.Signaled() && status.Signal() == syscall.SIGTERM {
// wait for the completion of abort handler
time.Sleep(10 * time.Second)
return errors.Errorf("process stopped with SIGTERM signal")
if revertErr := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, index-1); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not inject chaos")
}
targets[index].Cmds = append(targets[index].Cmds, cmd)
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainers[i])
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, index); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.Info("[Wait]: Waiting for chaos completion")
// channel to check the completion of the stress process
go func() {
var errList []string
var exitErr error
for _, t := range targets {
for i := range t.Cmds {
if err := t.Cmds[i].Cmd.Wait(); err != nil {
log.Infof("stress process failed, err: %v, out: %v", err, t.Cmds[i].Buffer.String())
if _, ok := err.(*exec.ExitError); ok {
exitErr = err
continue
}
errList = append(errList, err.Error())
}
}
}
if exitErr != nil {
oomKilled, err := checkOOMKilled(targets, clients, exitErr)
if err != nil {
log.Infof("could not check oomkilled, err: %v", err)
}
if !oomKilled {
done <- exitErr
}
done <- nil
} else if len(errList) != 0 {
oomKilled, err := checkOOMKilled(targets, clients, fmt.Errorf("err: %v", strings.Join(errList, ", ")))
if err != nil {
log.Infof("could not check oomkilled, err: %v", err)
}
if !oomKilled {
done <- fmt.Errorf("err: %v", strings.Join(errList, ", "))
}
done <- nil
} else {
done <- nil
}
}()
// check the timeout for the command
// Note: timeout will occur when process didn't complete even after 10s of chaos duration
timeout := time.After((time.Duration(experimentsDetails.ChaosDuration) + 30) * time.Second)
select {
case <-timeout:
// the stress process gets timeout before completion
log.Infof("[Chaos] The stress process is not yet completed after the chaos duration of %vs", experimentsDetails.ChaosDuration+30)
log.Info("[Timeout]: Killing the stress process")
if err := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
case err := <-done:
if err != nil {
exitErr, ok := err.(*exec.ExitError)
if ok {
status := exitErr.Sys().(syscall.WaitStatus)
if status.Signaled() {
log.Infof("process stopped with signal: %v", status.Signal())
}
if status.Signaled() && status.Signal() == syscall.SIGKILL {
// wait for the completion of abort handler
time.Sleep(10 * time.Second)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Source: chaosDetails.ChaosPodName, Reason: fmt.Sprintf("process stopped with SIGTERM signal")}
}
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: chaosDetails.ChaosPodName, Reason: err.Error()}
}
log.Info("[Info]: Reverting Chaos")
if err := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
}
return nil
}
func revertChaosForAllTargets(targets []*targetDetails, resultDetails *types.ResultDetails, chaosNs string, index int) error {
var errList []string
for i := 0; i <= index; i++ {
if err := terminateProcess(targets[i]); err != nil {
errList = append(errList, err.Error())
continue
}
if err := result.AnnotateChaosResult(resultDetails.Name, chaosNs, "reverted", "pod", targets[i].Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// checkOOMKilled checks if the container within the target pods failed due to an OOMKilled error.
func checkOOMKilled(targets []*targetDetails, clients clients.ClientSets, chaosError error) (bool, error) {
// Check each container in the pod
for i := 0; i < 3; i++ {
for _, t := range targets {
// Fetch the target pod
targetPod, err := clients.KubeClient.CoreV1().Pods(t.Namespace).Get(context.Background(), t.Name, v1.GetOptions{})
if err != nil {
return false, cerrors.Error{
ErrorCode: cerrors.ErrorTypeStatusChecks,
Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace),
Reason: err.Error(),
}
}
for _, c := range targetPod.Status.ContainerStatuses {
if utils.Contains(c.Name, t.TargetContainers) {
// Check for OOMKilled and restart
if c.LastTerminationState.Terminated != nil && c.LastTerminationState.Terminated.ExitCode == 137 {
log.Warnf("[Warning]: The target container '%s' of pod '%s' got OOM Killed, err: %v", c.Name, t.Name, chaosError)
return true, nil
}
}
return errors.Errorf("process exited before the actual cleanup, err: %v", err)
}
log.Info("[Info]: Chaos injection completed")
terminateProcess(cmd.Process.Pid)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
}
time.Sleep(1 * time.Second)
}
return false, nil
}
// terminateProcess will remove the stress process from the target container after chaos completion
func terminateProcess(t *targetDetails) error {
var errList []string
for i := range t.Cmds {
if t.Cmds[i] != nil && t.Cmds[i].Cmd.Process != nil {
if err := syscall.Kill(-t.Cmds[i].Cmd.Process.Pid, syscall.SIGKILL); err != nil {
if strings.Contains(err.Error(), ProcessAlreadyKilled) || strings.Contains(err.Error(), ProcessAlreadyFinished) {
continue
}
errList = append(errList, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[i]), Reason: fmt.Sprintf("failed to revert chaos: %s", err.Error())}.Error())
continue
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainers[i])
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
//terminateProcess will remove the stress process from the target container after chaos completion
func terminateProcess(pid int) error {
process, err := os.FindProcess(pid)
if err != nil {
return errors.Errorf("unreachable path, err: %v", err)
}
if err = process.Signal(syscall.SIGTERM); err != nil && err.Error() != ProcessAlreadyFinished {
return errors.Errorf("error while killing process, err: %v", err)
}
log.Info("[Info]: Stress process removed successfully")
return nil
}
//prepareStressor will set the required stressors for the given experiment
// prepareStressor will set the required stressors for the given experiment
func prepareStressor(experimentDetails *experimentTypes.ExperimentDetails) []string {
stressArgs := []string{
@ -235,10 +358,11 @@ func prepareStressor(experimentDetails *experimentTypes.ExperimentDetails) []str
log.InfoWithValues("[Info]: Details of Stressor:", logrus.Fields{
"CPU Core": experimentDetails.CPUcores,
"CPU Load": experimentDetails.CPULoad,
"Timeout": experimentDetails.ChaosDuration,
})
stressArgs = append(stressArgs, "--cpu "+strconv.Itoa(experimentDetails.CPUcores))
stressArgs = append(stressArgs, " --cpu-load "+strconv.Itoa(experimentDetails.CPULoad))
stressArgs = append(stressArgs, "--cpu "+experimentDetails.CPUcores)
stressArgs = append(stressArgs, " --cpu-load "+experimentDetails.CPULoad)
case "pod-memory-stress":
@ -247,22 +371,22 @@ func prepareStressor(experimentDetails *experimentTypes.ExperimentDetails) []str
"Memory Consumption": experimentDetails.MemoryConsumption,
"Timeout": experimentDetails.ChaosDuration,
})
stressArgs = append(stressArgs, "--vm "+strconv.Itoa(experimentDetails.NumberOfWorkers)+" --vm-bytes "+strconv.Itoa(experimentDetails.MemoryConsumption)+"M")
stressArgs = append(stressArgs, "--vm "+experimentDetails.NumberOfWorkers+" --vm-bytes "+experimentDetails.MemoryConsumption+"M")
case "pod-io-stress":
var hddbytes string
if experimentDetails.FilesystemUtilizationBytes == 0 {
if experimentDetails.FilesystemUtilizationPercentage == 0 {
if experimentDetails.FilesystemUtilizationBytes == "0" {
if experimentDetails.FilesystemUtilizationPercentage == "0" {
hddbytes = "10%"
log.Info("Neither of FilesystemUtilizationPercentage or FilesystemUtilizationBytes provided, proceeding with a default FilesystemUtilizationPercentage value of 10%")
} else {
hddbytes = strconv.Itoa(experimentDetails.FilesystemUtilizationPercentage) + "%"
hddbytes = experimentDetails.FilesystemUtilizationPercentage + "%"
}
} else {
if experimentDetails.FilesystemUtilizationPercentage == 0 {
hddbytes = strconv.Itoa(experimentDetails.FilesystemUtilizationBytes) + "G"
if experimentDetails.FilesystemUtilizationPercentage == "0" {
hddbytes = experimentDetails.FilesystemUtilizationBytes + "G"
} else {
hddbytes = strconv.Itoa(experimentDetails.FilesystemUtilizationPercentage) + "%"
hddbytes = experimentDetails.FilesystemUtilizationPercentage + "%"
log.Warn("Both FsUtilPercentage & FsUtilBytes provided as inputs, using the FsUtilPercentage value to proceed with stress exp")
}
}
@ -274,87 +398,42 @@ func prepareStressor(experimentDetails *experimentTypes.ExperimentDetails) []str
"Volume Mount Path": experimentDetails.VolumeMountPath,
})
if experimentDetails.VolumeMountPath == "" {
stressArgs = append(stressArgs, "--io "+strconv.Itoa(experimentDetails.NumberOfWorkers)+" --hdd "+strconv.Itoa(experimentDetails.NumberOfWorkers)+" --hdd-bytes "+hddbytes)
stressArgs = append(stressArgs, "--io "+experimentDetails.NumberOfWorkers+" --hdd "+experimentDetails.NumberOfWorkers+" --hdd-bytes "+hddbytes)
} else {
stressArgs = append(stressArgs, "--io "+strconv.Itoa(experimentDetails.NumberOfWorkers)+" --hdd "+strconv.Itoa(experimentDetails.NumberOfWorkers)+" --hdd-bytes "+hddbytes+" --temp-path "+experimentDetails.VolumeMountPath)
stressArgs = append(stressArgs, "--io "+experimentDetails.NumberOfWorkers+" --hdd "+experimentDetails.NumberOfWorkers+" --hdd-bytes "+hddbytes+" --temp-path "+experimentDetails.VolumeMountPath)
}
if experimentDetails.CPUcores != 0 {
stressArgs = append(stressArgs, "--cpu %v", strconv.Itoa(experimentDetails.CPUcores))
if experimentDetails.CPUcores != "0" {
stressArgs = append(stressArgs, "--cpu %v", experimentDetails.CPUcores)
}
default:
log.Fatalf("stressor for %v experiment is not suported", experimentDetails.ExperimentName)
log.Fatalf("stressor for %v experiment is not supported", experimentDetails.ExperimentName)
}
return stressArgs
}
//getPID extract out the PID of the target container
func getPID(experimentDetails *experimentTypes.ExperimentDetails, containerID string) (int, error) {
var PID int
switch experimentDetails.ContainerRuntime {
case "docker":
host := "unix://" + experimentDetails.SocketPath
// deriving pid from the inspect out of target container
out, err := exec.Command("sudo", "docker", "--host", host, "inspect", containerID).CombinedOutput()
if err != nil {
log.Error(fmt.Sprintf("[docker]: Failed to run docker inspect: %s", string(out)))
return 0, err
}
// parsing data from the json output of inspect command
PID, err = parsePIDFromJSON(out, experimentDetails.ContainerRuntime)
if err != nil {
log.Error(fmt.Sprintf("[docker]: Failed to parse json from docker inspect output: %s", string(out)))
return 0, err
}
case "containerd", "crio":
// deriving pid from the inspect out of target container
endpoint := "unix://" + experimentDetails.SocketPath
out, err := exec.Command("sudo", "crictl", "-i", endpoint, "-r", endpoint, "inspect", containerID).CombinedOutput()
if err != nil {
log.Error(fmt.Sprintf("[cri]: Failed to run crictl: %s", string(out)))
return 0, err
}
// parsing data from the json output of inspect command
PID, err = parsePIDFromJSON(out, experimentDetails.ContainerRuntime)
if err != nil {
log.Errorf(fmt.Sprintf("[cri]: Failed to parse json from crictl output: %s", string(out)))
return 0, err
}
default:
return 0, errors.Errorf("%v container runtime not suported", experimentDetails.ContainerRuntime)
}
log.Info(fmt.Sprintf("[Info]: Container ID=%s has process PID=%d", containerID, PID))
return PID, nil
}
//pidPath will get the pid path of the container
func pidPath(pid int) cgroups.Path {
processPath := "/proc/" + strconv.Itoa(pid) + "/cgroup"
paths, err := parseCgroupFile(processPath)
// pidPath will get the pid path of the container
func pidPath(t *targetDetails, index int) cgroups.Path {
processPath := "/proc/" + strconv.Itoa(t.Pids[index]) + "/cgroup"
paths, err := parseCgroupFile(processPath, t, index)
if err != nil {
return getErrorPath(errors.Wrapf(err, "parse cgroup file %s", processPath))
}
return getExistingPath(paths, pid, "")
return getExistingPath(paths, t.Pids[index], "")
}
//parseCgroupFile will read and verify the cgroup file entry of a container
func parseCgroupFile(path string) (map[string]string, error) {
// parseCgroupFile will read and verify the cgroup file entry of a container
func parseCgroupFile(path string, t *targetDetails, index int) (map[string]string, error) {
file, err := os.Open(path)
if err != nil {
return nil, errors.Errorf("unable to parse cgroup file: %v", err)
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to parse cgroup: %s", err.Error())}
}
defer file.Close()
return parseCgroupFromReader(file)
return parseCgroupFromReader(file, t, index)
}
//parseCgroupFromReader will parse the cgroup file from the reader
func parseCgroupFromReader(r io.Reader) (map[string]string, error) {
// parseCgroupFromReader will parse the cgroup file from the reader
func parseCgroupFromReader(r io.Reader, t *targetDetails, index int) (map[string]string, error) {
var (
cgroups = make(map[string]string)
s = bufio.NewScanner(r)
@ -365,7 +444,7 @@ func parseCgroupFromReader(r io.Reader) (map[string]string, error) {
parts = strings.SplitN(text, ":", 3)
)
if len(parts) < 3 {
return nil, errors.Errorf("invalid cgroup entry: %q", text)
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("invalid cgroup entry: %q", text)}
}
for _, subs := range strings.Split(parts[1], ",") {
if subs != "" {
@ -374,13 +453,13 @@ func parseCgroupFromReader(r io.Reader) (map[string]string, error) {
}
}
if err := s.Err(); err != nil {
return nil, errors.Errorf("buffer scanner failed: %v", err)
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("buffer scanner failed: %s", err.Error())}
}
return cgroups, nil
}
//getExistingPath will be used to get the existing valid cgroup path
// getExistingPath will be used to get the existing valid cgroup path
func getExistingPath(paths map[string]string, pid int, suffix string) cgroups.Path {
for n, p := range paths {
dest, err := getCgroupDestination(pid, n)
@ -410,14 +489,14 @@ func getExistingPath(paths map[string]string, pid int, suffix string) cgroups.Pa
}
}
//getErrorPath will give the invalid cgroup path
// getErrorPath will give the invalid cgroup path
func getErrorPath(err error) cgroups.Path {
return func(_ cgroups.Name) (string, error) {
return "", err
}
}
//getCgroupDestination will validate the subsystem with the mountpath in container mountinfo file.
// getCgroupDestination will validate the subsystem with the mountpath in container mountinfo file.
func getCgroupDestination(pid int, subsystem string) (string, error) {
mountinfoPath := fmt.Sprintf("/proc/%d/mountinfo", pid)
file, err := os.Open(mountinfoPath)
@ -440,69 +519,25 @@ func getCgroupDestination(pid int, subsystem string) (string, error) {
return "", errors.Errorf("no destination found for %v ", subsystem)
}
//findValidCgroup will be used to get a valid cgroup path
func findValidCgroup(path cgroups.Path, target string) (string, error) {
// findValidCgroup will be used to get a valid cgroup path
func findValidCgroup(path cgroups.Path, t *targetDetails, index int) (string, error) {
for _, subsystem := range cgroupSubsystemList {
path, err := path(cgroups.Name(subsystem))
if err != nil {
log.Errorf("fail to retrieve the cgroup path, subsystem: %v, target: %v, err: %v", subsystem, target, err)
log.Errorf("fail to retrieve the cgroup path, subsystem: %v, target: %v, err: %v", subsystem, t.ContainerIds[index], err)
continue
}
if strings.Contains(path, target) {
if strings.Contains(path, t.ContainerIds[index]) {
return path, nil
}
}
return "", errors.Errorf("never found valid cgroup for %s", target)
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: "could not find valid cgroup"}
}
//parsePIDFromJSON extract the pid from the json output
func parsePIDFromJSON(j []byte, runtime string) (int, error) {
var pid int
switch runtime {
case "docker":
// in docker, pid is present inside state.pid attribute of inspect output
var resp []common.DockerInspectResponse
if err := json.Unmarshal(j, &resp); err != nil {
return 0, err
}
pid = resp[0].State.PID
case "containerd":
var resp common.CrictlInspectResponse
if err := json.Unmarshal(j, &resp); err != nil {
return 0, err
}
pid = resp.Info.PID
case "crio":
var info common.InfoDetails
if err := json.Unmarshal(j, &info); err != nil {
return 0, err
}
pid = info.PID
if pid == 0 {
var resp common.CrictlInspectResponse
if err := json.Unmarshal(j, &resp); err != nil {
return 0, err
}
pid = resp.Info.PID
}
default:
return 0, errors.Errorf("[cri]: No supported container runtime, runtime: %v", runtime)
}
if pid == 0 {
return 0, errors.Errorf("[cri]: No running target container found, pid: %d", pid)
}
return pid, nil
}
//getENV fetches all the env variables from the runner pod
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
@ -510,18 +545,18 @@ func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
experimentDetails.CPUcores, _ = strconv.Atoi(types.Getenv("CPU_CORES", ""))
experimentDetails.CPULoad, _ = strconv.Atoi(types.Getenv("CPU_LOAD", ""))
experimentDetails.FilesystemUtilizationPercentage, _ = strconv.Atoi(types.Getenv("FILESYSTEM_UTILIZATION_PERCENTAGE", ""))
experimentDetails.FilesystemUtilizationBytes, _ = strconv.Atoi(types.Getenv("FILESYSTEM_UTILIZATION_BYTES", ""))
experimentDetails.NumberOfWorkers, _ = strconv.Atoi(types.Getenv("NUMBER_OF_WORKERS", ""))
experimentDetails.MemoryConsumption, _ = strconv.Atoi(types.Getenv("MEMORY_CONSUMPTION", ""))
experimentDetails.CPUcores = types.Getenv("CPU_CORES", "")
experimentDetails.CPULoad = types.Getenv("CPU_LOAD", "")
experimentDetails.FilesystemUtilizationPercentage = types.Getenv("FILESYSTEM_UTILIZATION_PERCENTAGE", "")
experimentDetails.FilesystemUtilizationBytes = types.Getenv("FILESYSTEM_UTILIZATION_BYTES", "")
experimentDetails.NumberOfWorkers = types.Getenv("NUMBER_OF_WORKERS", "")
experimentDetails.MemoryConsumption = types.Getenv("MEMORY_CONSUMPTION", "")
experimentDetails.VolumeMountPath = types.Getenv("VOLUME_MOUNT_PATH", "")
experimentDetails.StressType = types.Getenv("STRESS_TYPE", "")
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targetPID int, resultName, chaosNS, targetPodName string) {
func abortWatcher(targets []*targetDetails, resultName, chaosNS string) {
<-abort
@ -530,15 +565,133 @@ func abortWatcher(targetPID int, resultName, chaosNS, targetPodName string) {
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
if err = terminateProcess(targetPID); err != nil {
log.Errorf("unable to kill stress process, err :%v", err)
for _, t := range targets {
if err = terminateProcess(t); err != nil {
log.Errorf("[Abort]: unable to revert for %v pod, err :%v", t.Name, err)
continue
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("[Abort]: Unable to annotate the chaosresult for %v pod, err :%v", t.Name, err)
}
}
retry--
time.Sleep(1 * time.Second)
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", targetPodName); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}
// getCGroupManager will return the cgroup for the given pid of the process
func getCGroupManager(t *targetDetails, index int) (interface{}, error, string) {
if cgroups.Mode() == cgroups.Unified {
groupPath := ""
output, err := exec.Command("bash", "-c", fmt.Sprintf("nsenter -t 1 -C -m -- cat /proc/%v/cgroup", t.Pids[index])).CombinedOutput()
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to get the cgroup: %s :%v", err.Error(), output)}, ""
}
log.Infof("cgroup output: %s", string(output))
parts := strings.Split(string(output), ":")
if len(parts) < 3 {
return "", fmt.Errorf("invalid cgroup entry: %s", string(output)), ""
}
if strings.HasSuffix(parts[len(parts)-3], "0") && parts[len(parts)-2] == "" {
groupPath = parts[len(parts)-1]
}
log.Infof("group path: %s", groupPath)
cgroup2, err := cgroupsv2.LoadManager("/sys/fs/cgroup", string(groupPath))
if err != nil {
return nil, errors.Errorf("Error loading cgroup v2 manager, %v", err), ""
}
return cgroup2, nil, groupPath
}
path := pidPath(t, index)
cgroup, err := findValidCgroup(path, t, index)
if err != nil {
return nil, stacktrace.Propagate(err, "could not find valid cgroup"), ""
}
cgroup1, err := cgroups.Load(cgroups.V1, cgroups.StaticPath(cgroup))
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to load the cgroup: %s", err.Error())}, ""
}
return cgroup1, nil, ""
}
// addProcessToCgroup will add the process to cgroup
// By default it will add to v1 cgroup
func addProcessToCgroup(pid int, control interface{}, groupPath string) error {
if cgroups.Mode() == cgroups.Unified {
args := []string{"-t", "1", "-C", "--", "sudo", "sh", "-c", fmt.Sprintf("echo %d >> /sys/fs/cgroup%s/cgroup.procs", pid, strings.ReplaceAll(groupPath, "\n", ""))}
output, err := exec.Command("nsenter", args...).CombinedOutput()
if err != nil {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: fmt.Sprintf("failed to add process to cgroup %s: %v", string(output), err),
}
}
return nil
}
var cgroup1 = control.(cgroups.Cgroup)
return cgroup1.Add(cgroups.Process{Pid: pid})
}
func injectChaos(t *targetDetails, stressors string, index int, stressType string) (*Command, error) {
stressCommand := fmt.Sprintf("pause nsutil -t %v -p -- %v", strconv.Itoa(t.Pids[index]), stressors)
// for io stress,we need to enter into mount ns of the target container
// enabling it by passing -m flag
if stressType == "pod-io-stress" {
stressCommand = fmt.Sprintf("pause nsutil -t %v -p -m -- %v", strconv.Itoa(t.Pids[index]), stressors)
}
log.Infof("[Info]: starting process: %v", stressCommand)
// launch the stress-ng process on the target container in paused mode
cmd := exec.Command("/bin/bash", "-c", stressCommand)
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
var buf bytes.Buffer
cmd.Stdout = &buf
cmd.Stderr = &buf
err = cmd.Start()
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("failed to start stress process: %s", err.Error())}
}
// add the stress process to the cgroup of target container
if err = addProcessToCgroup(cmd.Process.Pid, t.CGroupManagers[index], t.GroupPath); err != nil {
if killErr := cmd.Process.Kill(); killErr != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to add the stress process to cgroup %s and kill stress process: %s", err.Error(), killErr.Error())}
}
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to add the stress process to cgroup: %s", err.Error())}
}
log.Info("[Info]: Sending signal to resume the stress process")
// wait for the process to start before sending the resume signal
// TODO: need a dynamic way to check the start of the process
time.Sleep(700 * time.Millisecond)
// remove pause and resume or start the stress process
if err := cmd.Process.Signal(syscall.SIGCONT); err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to remove pause and start the stress process: %s", err.Error())}
}
return &Command{
Cmd: cmd,
Buffer: buf,
}, nil
}
type targetDetails struct {
Name string
Namespace string
TargetContainers []string
ContainerIds []string
Pids []int
CGroupManagers []interface{}
Cmds []*Command
Source string
GroupPath string
}
type Command struct {
Cmd *exec.Cmd
Buffer bytes.Buffer
}

View File

@ -1,41 +1,74 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareAndInjectStressChaos contains the prepration & injection steps for the stress experiments.
func PrepareAndInjectStressChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareAndInjectStressChaos contains the prepration & injection steps for the stress experiments.
func PrepareAndInjectStressChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodStressFault")
defer span.End()
var err error
//Set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
switch experimentsDetails.StressType {
case "pod-cpu-stress":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"CPU Core": experimentsDetails.CPUcores,
"CPU Load Percentage": experimentsDetails.CPULoad,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "pod-memory-stress":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Number of Workers": experimentsDetails.NumberOfWorkers,
"Memory Consumption": experimentsDetails.MemoryConsumption,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "pod-io-stress":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"FilesystemUtilizationBytes": experimentsDetails.FilesystemUtilizationBytes,
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
}
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("Please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("[Info]: Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
@ -46,48 +79,41 @@ func PrepareAndInjectStressChaos(experimentsDetails *experimentTypes.ExperimentD
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err)
}
}
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, clients); err != nil {
return err
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode inject the stress chaos in all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodStressFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -95,112 +121,79 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
// creating the helper pod to perform the stress chaos
for _, pod := range targetPodList.Items {
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for stress chaos
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("unable to delete the helper pods, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode inject the stress chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodStressFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform stress chaos
for _, pod := range targetPodList.Items {
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix)
if err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for stress chaos
log.Info("[Cleanup]: Deleting all the helper pod")
err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("unable to delete the helper pods, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, podName, nodeName, runID, labelSuffix string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreatePodStressFaultHelperPod")
defer span.End()
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
@ -242,7 +235,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name stress-chaos",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, podName),
Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "socket-path",
@ -258,11 +251,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
RunAsUser: ptrint64(0),
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_PTRACE",
"SYS_ADMIN",
"MKNOD",
"SYS_CHROOT",
"KILL",
},
},
},
@ -271,18 +260,23 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName string) []apiv1.EnvVar {
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS).
SetEnv("APP_POD", podName).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
@ -290,15 +284,17 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName st
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("CPU_CORES", strconv.Itoa(experimentsDetails.CPUcores)).
SetEnv("CPU_LOAD", strconv.Itoa(experimentsDetails.CPULoad)).
SetEnv("FILESYSTEM_UTILIZATION_PERCENTAGE", strconv.Itoa(experimentsDetails.FilesystemUtilizationPercentage)).
SetEnv("FILESYSTEM_UTILIZATION_BYTES", strconv.Itoa(experimentsDetails.FilesystemUtilizationBytes)).
SetEnv("NUMBER_OF_WORKERS", strconv.Itoa(experimentsDetails.NumberOfWorkers)).
SetEnv("MEMORY_CONSUMPTION", strconv.Itoa(experimentsDetails.MemoryConsumption)).
SetEnv("CPU_CORES", experimentsDetails.CPUcores).
SetEnv("CPU_LOAD", experimentsDetails.CPULoad).
SetEnv("FILESYSTEM_UTILIZATION_PERCENTAGE", experimentsDetails.FilesystemUtilizationPercentage).
SetEnv("FILESYSTEM_UTILIZATION_BYTES", experimentsDetails.FilesystemUtilizationBytes).
SetEnv("NUMBER_OF_WORKERS", experimentsDetails.NumberOfWorkers).
SetEnv("MEMORY_CONSUMPTION", experimentsDetails.MemoryConsumption).
SetEnv("VOLUME_MOUNT_PATH", experimentsDetails.VolumeMountPath).
SetEnv("STRESS_TYPE", experimentsDetails.StressType).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
@ -307,3 +303,16 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName st
func ptrint64(p int64) *int64 {
return &p
}
// SetChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.CPUcores = common.ValidateRange(experimentsDetails.CPUcores)
experimentsDetails.CPULoad = common.ValidateRange(experimentsDetails.CPULoad)
experimentsDetails.MemoryConsumption = common.ValidateRange(experimentsDetails.MemoryConsumption)
experimentsDetails.NumberOfWorkers = common.ValidateRange(experimentsDetails.NumberOfWorkers)
experimentsDetails.FilesystemUtilizationPercentage = common.ValidateRange(experimentsDetails.FilesystemUtilizationPercentage)
experimentsDetails.FilesystemUtilizationBytes = common.ValidateRange(experimentsDetails.FilesystemUtilizationBytes)
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -1,28 +1,34 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/vmware"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/vmware/vm-poweroff/types"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var inject, abort chan os.Signal
// InjectVMPowerOffChaos injects the chaos in serial or parallel mode
func InjectVMPowerOffChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, cookie string) error {
func InjectVMPowerOffChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, cookie string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareVMPowerOffFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
@ -47,15 +53,15 @@ func InjectVMPowerOffChaos(experimentsDetails *experimentTypes.ExperimentDetails
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(experimentsDetails, vmIdList, cookie, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err := injectChaosInSerialMode(ctx, experimentsDetails, vmIdList, cookie, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(experimentsDetails, vmIdList, cookie, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err := injectChaosInParallelMode(ctx, experimentsDetails, vmIdList, cookie, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -68,7 +74,10 @@ func InjectVMPowerOffChaos(experimentsDetails *experimentTypes.ExperimentDetails
}
// injectChaosInSerialMode stops VMs in serial mode i.e. one after the other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, cookie string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, cookie string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "injectVMPowerOffFaultInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
@ -93,7 +102,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Stopping the VM
log.Infof("[Chaos]: Stopping %s VM", vmId)
if err := vmware.StopVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("failed to stop %s vm: %s", vmId, err.Error())
return stacktrace.Propagate(err, fmt.Sprintf("failed to stop %s vm", vmId))
}
common.SetTargets(vmId, "injected", "VM", chaosDetails)
@ -101,14 +110,14 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Wait for the VM to completely stop
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_OFF state", vmId)
if err := vmware.WaitForVMStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("vm %s failed to successfully shutdown, err: %s", vmId, err.Error())
return stacktrace.Propagate(err, "VM shutdown failed")
}
//Run the probes during the chaos
//The OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -119,13 +128,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Starting the VM
log.Infof("[Chaos]: Starting back %s VM", vmId)
if err := vmware.StartVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("failed to start back %s vm: %s", vmId, err.Error())
return stacktrace.Propagate(err, "failed to start back vm")
}
//Wait for the VM to completely start
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_ON state", vmId)
if err := vmware.WaitForVMStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("vm %s failed to successfully start, err: %s", vmId, err.Error())
return stacktrace.Propagate(err, "vm failed to start")
}
common.SetTargets(vmId, "reverted", "VM", chaosDetails)
@ -139,7 +148,9 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode stops VMs in parallel mode i.e. all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, cookie string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, cookie string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "injectVMPowerOffFaultInParallelMode")
defer span.End()
select {
case <-inject:
@ -165,7 +176,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Stopping the VM
log.Infof("[Chaos]: Stopping %s VM", vmId)
if err := vmware.StopVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("failed to stop %s vm: %s", vmId, err.Error())
return stacktrace.Propagate(err, fmt.Sprintf("failed to stop %s vm", vmId))
}
common.SetTargets(vmId, "injected", "VM", chaosDetails)
@ -176,14 +187,14 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for the VM to completely stop
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_OFF state", vmId)
if err := vmware.WaitForVMStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("vm %s failed to successfully shutdown, err: %s", vmId, err.Error())
return stacktrace.Propagate(err, "vm failed to shutdown")
}
}
//Running the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -196,7 +207,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Starting the VM
log.Infof("[Chaos]: Starting back %s VM", vmId)
if err := vmware.StartVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("failed to start back %s vm: %s", vmId, err.Error())
return stacktrace.Propagate(err, fmt.Sprintf("failed to start back %s vm", vmId))
}
}
@ -205,7 +216,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for the VM to completely start
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_ON state", vmId)
if err := vmware.WaitForVMStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("vm %s failed to successfully start, err: %s", vmId, err.Error())
return stacktrace.Propagate(err, "vm failed to successfully start")
}
}

View File

@ -1,271 +0,0 @@
package lib
import (
"strconv"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-delete/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/openebs/maya/pkg/util/retry"
"github.com/pkg/errors"
appsv1 "k8s.io/api/apps/v1"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.ChaosServiceAccount == "" {
// Getting the serviceAccountName for the powerfulseal pod
err := GetServiceAccount(experimentsDetails, clients)
if err != nil {
return errors.Errorf("Unable to get the serviceAccountName, err: %v", err)
}
}
// generating a unique string which can be appended with the powerfulseal deployment name & labels for the uniquely identification
runID := common.GetRunID()
// generating the chaos inject event in the chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// Creating configmap for powerfulseal deployment
err := CreateConfigMap(experimentsDetails, clients, runID)
if err != nil {
return err
}
// Creating powerfulseal deployment
err = CreatePowerfulsealDeployment(experimentsDetails, clients, runID)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
//checking the status of the powerfulseal pod, wait till the powerfulseal pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "name=powerfulseal-"+runID, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("powerfulseal pod is not in running state, err: %v", err)
}
// Wait for Chaos Duration
log.Infof("[Wait]: Waiting for the %vs chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
//Deleting the powerfulseal deployment
log.Info("[Cleanup]: Deleting the powerfulseal deployment")
err = DeletePowerfulsealDeployment(experimentsDetails, clients, runID)
if err != nil {
return errors.Errorf("Unable to delete the powerfulseal deployment, err: %v", err)
}
//Deleting the powerfulseal configmap
log.Info("[Cleanup]: Deleting the powerfulseal configmap")
err = DeletePowerfulsealConfigmap(experimentsDetails, clients, runID)
if err != nil {
return errors.Errorf("Unable to delete the powerfulseal configmap, err: %v", err)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// GetServiceAccount find the serviceAccountName for the powerfulseal deployment
func GetServiceAccount(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Get(experimentsDetails.ChaosPodName, v1.GetOptions{})
if err != nil {
return err
}
experimentsDetails.ChaosServiceAccount = pod.Spec.ServiceAccountName
return nil
}
// CreateConfigMap creates a configmap for the powerfulseal deployment
func CreateConfigMap(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
data := map[string]string{}
// It will store all the details inside a string in well formated way
policy := GetConfigMapData(experimentsDetails)
data["policy"] = policy
configMap := &apiv1.ConfigMap{
ObjectMeta: v1.ObjectMeta{
Name: "policy-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"name": "policy-" + runID,
},
},
Data: data,
}
_, err := clients.KubeClient.CoreV1().ConfigMaps(experimentsDetails.ChaosNamespace).Create(configMap)
return err
}
// GetConfigMapData generates the configmap data for the powerfulseal deployments in desired format format
func GetConfigMapData(experimentsDetails *experimentTypes.ExperimentDetails) string {
waitTime, _ := strconv.Atoi(experimentsDetails.ChaosInterval)
policy := "config:" + "\n" +
" minSecondsBetweenRuns: 1" + "\n" +
" maxSecondsBetweenRuns: " + strconv.Itoa(waitTime) + "\n" +
"podScenarios:" + "\n" +
" - name: \"delete random pods in application namespace\"" + "\n" +
" match:" + "\n" +
" - labels:" + "\n" +
" namespace: " + experimentsDetails.AppNS + "\n" +
" selector: " + experimentsDetails.AppLabel + "\n" +
" filters:" + "\n" +
" - randomSample:" + "\n" +
" size: 1" + "\n" +
" actions:" + "\n" +
" - kill:" + "\n" +
" probability: 0.77" + "\n" +
" force: " + strconv.FormatBool(experimentsDetails.Force)
return policy
}
// CreatePowerfulsealDeployment derive the attributes for powerfulseal deployment and create it
func CreatePowerfulsealDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
deployment := &appsv1.Deployment{
ObjectMeta: v1.ObjectMeta{
Name: "powerfulseal-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": "powerfulseal",
"name": "powerfulseal-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
},
},
Spec: appsv1.DeploymentSpec{
Selector: &v1.LabelSelector{
MatchLabels: map[string]string{
"name": "powerfulseal-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
},
Replicas: func(i int32) *int32 { return &i }(1),
Template: apiv1.PodTemplateSpec{
ObjectMeta: v1.ObjectMeta{
Labels: map[string]string{
"name": "powerfulseal-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
},
Spec: apiv1.PodSpec{
Volumes: []apiv1.Volume{
{
Name: "policyfile",
VolumeSource: apiv1.VolumeSource{
ConfigMap: &apiv1.ConfigMapVolumeSource{
LocalObjectReference: apiv1.LocalObjectReference{
Name: "policy-" + runID,
},
},
},
},
},
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
TerminationGracePeriodSeconds: func(i int64) *int64 { return &i }(0),
Containers: []apiv1.Container{
{
Name: "powerfulseal",
Image: "ksatchit/miko-powerfulseal:non-ssh",
Args: []string{
"autonomous",
"--inventory-kubernetes",
"--no-cloud",
"--policy-file=/root/policy_kill_random_default.yml",
"--use-pod-delete-instead-of-ssh-kill",
},
VolumeMounts: []apiv1.VolumeMount{
{
Name: "policyfile",
MountPath: "/root/policy_kill_random_default.yml",
SubPath: "policy",
},
},
},
},
},
},
},
}
_, err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaosNamespace).Create(deployment)
return err
}
//DeletePowerfulsealDeployment delete the powerfulseal deployment
func DeletePowerfulsealDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaosNamespace).Delete("powerfulseal-"+runID, &v1.DeleteOptions{})
if err != nil {
return err
}
err = retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
podSpec, err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaosNamespace).List(v1.ListOptions{LabelSelector: "name=powerfulseal-" + runID})
if err != nil || len(podSpec.Items) != 0 {
return errors.Errorf("Deployment is not deleted yet, err: %v", err)
}
return nil
})
return err
}
//DeletePowerfulsealConfigmap delete the powerfulseal configmap
func DeletePowerfulsealConfigmap(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
err := clients.KubeClient.CoreV1().ConfigMaps(experimentsDetails.ChaosNamespace).Delete("policy-"+runID, &v1.DeleteOptions{})
if err != nil {
return err
}
err = retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
podSpec, err := clients.KubeClient.CoreV1().ConfigMaps(experimentsDetails.ChaosNamespace).List(v1.ListOptions{LabelSelector: "name=policy-" + runID})
if err != nil || len(podSpec.Items) != 0 {
return errors.Errorf("configmap is not deleted yet, err: %v", err)
}
return nil
})
return err
}

View File

@ -1,344 +0,0 @@
package lib
import (
"strconv"
"strings"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareContainerKill contains the prepration steps before chaos injection
func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, clients); err != nil {
return err
}
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode kill the container of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
//GetRestartCount return the restart count of target container
restartCountBefore := getRestartCount(pod, experimentsDetails.TargetContainer)
log.Infof("restartCount of target container before chaos injection: %v", restartCountBefore)
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"Target Container": experimentsDetails.TargetContainer,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
log.Infof("[Wait]: Waiting for the %vs chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
// It will verify that the restart count of container should increase after chaos injection
if err := verifyRestartCount(experimentsDetails, pod, clients, restartCountBefore); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("target container is not restarted, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
//GetRestartCount return the restart count of target container
restartCountBefore := getRestartCountAll(targetPodList, experimentsDetails.TargetContainer)
log.Infof("restartCount of target containers before chaos injection: %v", restartCountBefore)
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"Target Container": experimentsDetails.TargetContainer,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
log.Infof("[Wait]: Waiting for the %vs chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
// It will verify that the restart count of container should increase after chaos injection
if err := verifyRestartCountAll(experimentsDetails, targetPodList, clients, restartCountBefore); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("target container is not restarted , err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
return nil
}
//getRestartCount return the restart count of target container
func getRestartCount(targetPod apiv1.Pod, containerName string) int {
restartCount := 0
for _, container := range targetPod.Status.ContainerStatuses {
if container.Name == containerName {
restartCount = int(container.RestartCount)
break
}
}
return restartCount
}
//getRestartCountAll return the restart count of all target container
func getRestartCountAll(targetPodList apiv1.PodList, containerName string) []int {
restartCount := []int{}
for _, pod := range targetPodList.Items {
restartCount = append(restartCount, getRestartCount(pod, containerName))
}
return restartCount
}
//verifyRestartCount verify the restart count of target container that it is restarted or not after chaos injection
// the restart count of container should increase after chaos injection
func verifyRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, pod apiv1.Pod, clients clients.ClientSets, restartCountBefore int) error {
restartCountAfter := 0
err := retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(pod.Name, v1.GetOptions{})
if err != nil {
return err
}
for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentsDetails.TargetContainer {
restartCountAfter = int(container.RestartCount)
break
}
}
return nil
})
if err != nil {
return err
}
// it will fail if restart count won't increase
if restartCountAfter <= restartCountBefore {
return errors.Errorf("target container is not restarted")
}
log.Infof("restartCount of target container after chaos injection: %v", restartCountAfter)
return nil
}
//verifyRestartCountAll verify the restart count of all the target container that it is restarted or not after chaos injection
// the restart count of container should increase after chaos injection
func verifyRestartCountAll(experimentsDetails *experimentTypes.ExperimentDetails, podList apiv1.PodList, clients clients.ClientSets, restartCountBefore []int) error {
for index, pod := range podList.Items {
if err := verifyRestartCount(experimentsDetails, pod, clients, restartCountBefore[index]); err != nil {
return err
}
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error {
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"sudo",
"-E",
},
Args: []string{
"pumba",
"--random",
"--interval",
strconv.Itoa(experimentsDetails.ChaosInterval) + "s",
"kill",
"--signal",
experimentsDetails.Signal,
"re2:k8s_" + experimentsDetails.TargetContainer + "_" + appName,
},
Env: []apiv1.EnvVar{
{
Name: "DOCKER_HOST",
Value: "unix://" + experimentsDetails.SocketPath,
},
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}

View File

@ -1,274 +0,0 @@
package lib
import (
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PreparePodCPUHog contains prepration steps before chaos injection
func PreparePodCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, clients); err != nil {
return err
}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stress the cpu of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform cpu chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"CPUcores": experimentsDetails.CPUcores,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform cpu chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"CPUcores": experimentsDetails.CPUcores,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error {
helperPod := &apiv1.Pod{
TypeMeta: v1.TypeMeta{
Kind: "Pod",
APIVersion: "v1",
},
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: "pumba-stress",
Image: experimentsDetails.LIBImage,
Command: []string{
"sudo",
"-E",
},
Args: getContainerArguments(experimentsDetails, appName),
Env: []apiv1.EnvVar{
{
Name: "DOCKER_HOST",
Value: "unix://" + experimentsDetails.SocketPath,
},
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
SecurityContext: &apiv1.SecurityContext{
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}
// getContainerArguments derives the args for the pumba stress helper pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails, appName string) []string {
stressArgs := []string{
"pumba",
"--log-level",
"debug",
"--label",
"io.kubernetes.pod.name=" + appName,
"stress",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--stress-image",
experimentsDetails.StressImage,
"--stressors",
"--cpu " + strconv.Itoa(experimentsDetails.CPUcores) + " --timeout " + strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
return stressArgs
}

View File

@ -1,275 +0,0 @@
package lib
import (
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PreparePodMemoryHog contains prepration steps before chaos injection
func PreparePodMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, clients); err != nil {
return err
}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stress the cpu of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform memory chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"MemoryBytes": experimentsDetails.MemoryConsumption,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform memory chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"MemoryBytes": experimentsDetails.MemoryConsumption,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error {
helperPod := &apiv1.Pod{
TypeMeta: v1.TypeMeta{
Kind: "Pod",
APIVersion: "v1",
},
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: "pumba-stress",
Image: experimentsDetails.LIBImage,
Command: []string{
"sudo",
"-E",
},
Args: getContainerArguments(experimentsDetails, appName),
Env: []apiv1.EnvVar{
{
Name: "DOCKER_HOST",
Value: "unix://" + experimentsDetails.SocketPath,
},
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
SecurityContext: &apiv1.SecurityContext{
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}
// getContainerArguments derives the args for the pumba stress helper pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails, appName string) []string {
stressArgs := []string{
"pumba",
"--log-level",
"debug",
"--label",
"io.kubernetes.pod.name=" + appName,
"stress",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--stress-image",
experimentsDetails.StressImage,
"--stressors",
"--cpu 1 --vm 1 --vm-bytes " + strconv.Itoa(experimentsDetails.MemoryConsumption) + "M --timeout " + strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
return stressArgs
}

View File

@ -1,43 +0,0 @@
package corruption
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
//PodNetworkCorruptionChaos contains the steps to prepare and inject chaos
func PodNetworkCorruptionChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args, err := getContainerArguments(experimentsDetails)
if err != nil {
return err
}
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// getContainerArguments derives the args for the pumba pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) ([]string, error) {
baseArgs := []string{
"pumba",
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args, err := network_chaos.AddTargetIpsArgs(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, args)
if err != nil {
return args, err
}
args = append(args, "corrupt", "--percent", strconv.Itoa(experimentsDetails.NetworkPacketCorruptionPercentage))
return args, nil
}

View File

@ -1,43 +0,0 @@
package duplication
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
//PodNetworkDuplicationChaos contains the steps to prepare and inject chaos
func PodNetworkDuplicationChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args, err := getContainerArguments(experimentsDetails)
if err != nil {
return err
}
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// getContainerArguments derives the args for the pumba pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) ([]string, error) {
baseArgs := []string{
"pumba",
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args, err := network_chaos.AddTargetIpsArgs(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, args)
if err != nil {
return args, err
}
args = append(args, "duplicate", "--percent", strconv.Itoa(experimentsDetails.NetworkPacketDuplicationPercentage))
return args, nil
}

View File

@ -1,43 +0,0 @@
package latency
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
//PodNetworkLatencyChaos contains the steps to prepare and inject chaos
func PodNetworkLatencyChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args, err := getContainerArguments(experimentsDetails)
if err != nil {
return err
}
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// getContainerArguments derives the args for the pumba pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) ([]string, error) {
baseArgs := []string{
"pumba",
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args, err := network_chaos.AddTargetIpsArgs(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, args)
if err != nil {
return args, err
}
args = append(args, "delay", "--time", strconv.Itoa(experimentsDetails.NetworkLatency))
return args, nil
}

View File

@ -1,43 +0,0 @@
package loss
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
//PodNetworkLossChaos contains the steps to prepare and inject chaos
func PodNetworkLossChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args, err := getContainerArguments(experimentsDetails)
if err != nil {
return err
}
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// getContainerArguments derives the args for the pumba pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) ([]string, error) {
baseArgs := []string{
"pumba",
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args, err := network_chaos.AddTargetIpsArgs(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, args)
if err != nil {
return args, err
}
args = append(args, "loss", "--percent", strconv.Itoa(experimentsDetails.NetworkPacketLossPercentage))
return args, nil
}

View File

@ -1,269 +0,0 @@
package lib
import (
"strings"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareAndInjectChaos contains the prepration and chaos injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args []string) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, clients); err != nil {
return err
}
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return err
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return err
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stress the cpu of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args []string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
})
// args contains details of the specific chaos injection
// constructing `argsWithRegex` based on updated regex with a diff pod name
// without extending/concatenating the args var itself
argsWithRegex := append(args, "re2:k8s_POD_"+pod.Name+"_"+experimentsDetails.AppNS)
log.Infof("Arguments for running %v are %v", experimentsDetails.ExperimentName, argsWithRegex)
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Spec.NodeName, runID, argsWithRegex, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, chaosDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args []string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
})
// args contains details of the specific chaos injection
// constructing `argsWithRegex` based on updated regex with a diff pod name
// without extending/concatenating the args var itself
argsWithRegex := append(args, "re2:k8s_POD_"+pod.Name+"_"+experimentsDetails.AppNS)
log.Infof("Arguments for running %v are %v", experimentsDetails.ExperimentName, argsWithRegex)
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Spec.NodeName, runID, argsWithRegex, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, chaosDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName, runID string, args []string, labelSuffix string) error {
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"sudo",
"-E",
},
Args: args,
Env: []apiv1.EnvVar{
{
Name: "DOCKER_HOST",
Value: "unix://" + experimentsDetails.SocketPath,
},
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}
// AddTargetIpsArgs inserts a comma-separated list of targetIPs (if provided by the user) into the pumba command/args
func AddTargetIpsArgs(targetIPs, targetHosts string, args []string) ([]string, error) {
targetIPs, err := network_chaos.GetTargetIps(targetIPs, targetHosts)
if err != nil {
return nil, err
}
if targetIPs == "" {
return args, nil
}
ips := strings.Split(targetIPs, ",")
for i := range ips {
args = append(args, "--target", strings.TrimSpace(ips[i]))
}
return args, nil
}

View File

@ -1,298 +0,0 @@
package lib
import (
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PreparePodIOStress contains prepration steps before chaos injection
func PreparePodIOStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, clients); err != nil {
return err
}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stress the cpu of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"FilesystemUtilizationBytes": experimentsDetails.FilesystemUtilizationBytes,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"FilesystemUtilizationBytes": experimentsDetails.FilesystemUtilizationBytes,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error {
helperPod := &apiv1.Pod{
TypeMeta: v1.TypeMeta{
Kind: "Pod",
APIVersion: "v1",
},
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: "pumba-stress",
Image: experimentsDetails.LIBImage,
Command: []string{
"sudo",
"-E",
},
Args: getContainerArguments(experimentsDetails, appName),
Env: []apiv1.EnvVar{
{
Name: "DOCKER_HOST",
Value: "unix://" + experimentsDetails.SocketPath,
},
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
SecurityContext: &apiv1.SecurityContext{
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}
// getContainerArguments derives the args for the pumba stress helper pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails, appName string) []string {
var hddbytes string
if experimentsDetails.FilesystemUtilizationBytes == 0 {
if experimentsDetails.FilesystemUtilizationPercentage == 0 {
hddbytes = "10%"
log.Info("Neither of FilesystemUtilizationPercentage or FilesystemUtilizationBytes provided, proceeding with a default FilesystemUtilizationPercentage value of 10%")
} else {
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationPercentage) + "%"
}
} else {
if experimentsDetails.FilesystemUtilizationPercentage == 0 {
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationBytes) + "G"
} else {
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationPercentage) + "%"
log.Warn("Both FsUtilPercentage & FsUtilBytes provided as inputs, using the FsUtilPercentage value to proceed with stress exp")
}
}
stressArgs := []string{
"pumba",
"--log-level",
"debug",
"--label",
"io.kubernetes.pod.name=" + appName,
"stress",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--stressors",
}
args := stressArgs
if experimentsDetails.VolumeMountPath == "" {
args = append(args, "--cpu 1 --io "+strconv.Itoa(experimentsDetails.NumberOfWorkers)+" --hdd "+strconv.Itoa(experimentsDetails.NumberOfWorkers)+" --hdd-bytes "+hddbytes+" --timeout "+strconv.Itoa(experimentsDetails.ChaosDuration)+"s")
} else {
args = append(args, "--cpu 1 --io "+strconv.Itoa(experimentsDetails.NumberOfWorkers)+" --hdd "+strconv.Itoa(experimentsDetails.NumberOfWorkers)+" --hdd-bytes "+hddbytes+" --temp-path "+experimentsDetails.VolumeMountPath+" --timeout "+strconv.Itoa(experimentsDetails.ChaosDuration)+"s")
}
return args
}

View File

@ -110,6 +110,14 @@ The *generate_experiment.go* script is a simple way to bootstrap your experiment
**Note**: Replace the `<generate-type>` placeholder with the appropriate value based on the usecase:
- `experiment`: Chaos experiment artifacts belonging to an existing OR new experiment.
- Provide the type of chaoslib in the `-t` flag. It supports the following values:
- `exec`: It creates the exec based chaoslib(default type)
- `helper`: It creates the helper based chaoslib
- `aws`: It creates sample experiment for aws platform.
- `vmware`: It creates sample experiment for vmware platform.
- `azure`: It creates sample experiment for azure platform.
- `gcp`: It creates sample experiment for gcp platform.
- `chart`: Just the chaos-chart metadata, i.e., chartserviceversion.yaml
- Provide the type of chart in the `-t` flag. It supports the following values:
- `category`: It creates the chart metadata for the category i.e chartserviceversion, package manifests

View File

@ -13,6 +13,7 @@ func main() {
var (
filePath string
chartType string
libType string
)
var generate = &cobra.Command{
@ -30,7 +31,7 @@ func main() {
Example: "./litmus-sdk generate experiment -f=attribute.yaml",
DisableFlagsInUseLine: true,
Run: func(cmd *cobra.Command, args []string) {
if err := sdkCmd.GenerateExperiment(filePath, chartType, "experiment"); err != nil {
if err := sdkCmd.GenerateExperiment(filePath, chartType, "experiment", libType); err != nil {
log.Fatalf("error: %v", err)
}
fmt.Println("experiment created successfully")
@ -45,7 +46,7 @@ func main() {
Example: "./litmus-sdk generate chart -f=attribute.yaml",
DisableFlagsInUseLine: true,
Run: func(cmd *cobra.Command, args []string) {
if err := sdkCmd.GenerateExperiment(filePath, chartType, "chart"); err != nil {
if err := sdkCmd.GenerateExperiment(filePath, chartType, "chart", libType); err != nil {
log.Fatalf("error: %v", err)
}
fmt.Println("chart created successfully")
@ -53,6 +54,7 @@ func main() {
}
experiment.Flags().StringVarP(&filePath, "file", "f", "", "path of the attribute.yaml manifest")
experiment.Flags().StringVarP(&libType, "type", "t", "exec", "type of the experiment lib")
chart.Flags().StringVarP(&filePath, "file", "f", "", "path of the attribute.yaml manifest")
chart.Flags().StringVarP(&chartType, "type", "t", "all", "type of the chaos chart")
chart.MarkFlagRequired("file")

View File

@ -2,6 +2,7 @@ package cmd
import (
"bytes"
"fmt"
"io/ioutil"
"os"
"os/exec"
@ -17,7 +18,7 @@ import (
)
// GenerateExperiment generate the new/custom chaos experiment based on specified attribute file
func GenerateExperiment(attributeFile, chartType string, generationType string) error {
func GenerateExperiment(attributeFile, chartType, generationType, libType string) error {
// Fetch all the required attributes from the given file
// Experiment contains all the required attributes
@ -65,17 +66,17 @@ func GenerateExperiment(attributeFile, chartType string, generationType string)
default:
// creating experiment dir & files
if err := createExperiment(experimentRootDIR, experimentDetails); err != nil {
if err := createExperiment(experimentRootDIR, experimentDetails, libType); err != nil {
return err
}
// creating chaoslib dir & files
if err := createChaosLib(litmusRootDir, experimentDetails); err != nil {
if err := createChaosLib(litmusRootDir, experimentDetails, libType); err != nil {
return err
}
// creating envs dir & files
if err := createENV(litmusRootDir, experimentDetails); err != nil {
if err := createENV(litmusRootDir, experimentDetails, libType); err != nil {
return err
}
@ -147,31 +148,43 @@ func copy(src, dest string) error {
}
// createExperimentFile creates the experiment file
func createExperiment(experimentRootDIR string, experimentDetails types.Experiment) error {
func createExperiment(experimentRootDIR string, experimentDetails types.Experiment, libType string) error {
var experimentTemplateName string
// create the experiment directory, if not present
experimentDIR := experimentRootDIR + "/experiment"
createDirectoryIfNotPresent(experimentDIR)
if libType == "helper" || libType == "exec" {
experimentTemplateName = "./templates/experiment_k8s.tmpl"
} else {
experimentTemplateName = fmt.Sprintf("./templates/experiment_%s.tmpl", libType)
}
// generating the experiement.go file
experimentFilePath := experimentDIR + "/" + experimentDetails.Name + ".go"
return generateFile(experimentDetails, experimentFilePath, "./templates/experiment.tmpl")
return generateFile(experimentDetails, experimentFilePath, experimentTemplateName)
}
// createChaosLib creates the chaoslib for the experiment
func createChaosLib(litmusRootDir string, experimentDetails types.Experiment) error {
func createChaosLib(litmusRootDir string, experimentDetails types.Experiment, libType string) error {
var chaosLibTemplateName string
// create the chaoslib directory, if not present
chaoslibRootDIR := litmusRootDir + "/chaoslib/litmus/" + experimentDetails.Name
createDirectoryIfNotPresent(chaoslibRootDIR)
chaoslibDIR := chaoslibRootDIR + "/lib"
createDirectoryIfNotPresent(chaoslibDIR)
if libType == "aws" || libType == "vmware" || libType == "gcp" || libType == "azure" {
chaosLibTemplateName = "./templates/chaoslib_non-k8s.tmpl"
} else {
chaosLibTemplateName = fmt.Sprintf("./templates/chaoslib_%s.tmpl", libType)
}
// generating the chaoslib file
chaoslibFilePath := chaoslibDIR + "/" + experimentDetails.Name + ".go"
return generateFile(experimentDetails, chaoslibFilePath, "./templates/chaoslib.tmpl")
return generateFile(experimentDetails, chaoslibFilePath, chaosLibTemplateName)
}
// createENV creates the env getter and setter files
func createENV(litmusRootDir string, experimentDetails types.Experiment) error {
func createENV(litmusRootDir string, experimentDetails types.Experiment, libType string) error {
// creating the directory for the environment variables file, if not present
experimentPKGDirectory := litmusRootDir + "/pkg/" + experimentDetails.Category
createDirectoryIfNotPresent(experimentPKGDirectory)
@ -188,13 +201,13 @@ func createENV(litmusRootDir string, experimentDetails types.Experiment) error {
// generating the environment var file
environmentFilePath := environmentDIR + "/" + "environment.go"
if err := generateFile(experimentDetails, environmentFilePath, "./templates/environment.tmpl"); err != nil {
if err := generateFile(experimentDetails, environmentFilePath, fmt.Sprintf("./templates/environment_%s.tmpl", libType)); err != nil {
return err
}
// generating the types.go file
typesFilePath := typesDIR + "/" + "types.go"
if err := generateFile(experimentDetails, typesFilePath, "./templates/types.tmpl"); err != nil {
if err := generateFile(experimentDetails, typesFilePath, fmt.Sprintf("./templates/types_%s.tmpl", libType)); err != nil {
return err
}
return nil

View File

@ -1,7 +1,11 @@
package lib
import (
"context"
"fmt"
"os"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/palantir/stacktrace"
"os/signal"
"syscall"
"time"
@ -13,7 +17,6 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
)
@ -25,18 +28,24 @@ func injectChaos(experimentsDetails *experimentTypes.ExperimentDetails, podName
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, experimentsDetails.AppNS)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return errors.Errorf("unable to run command inside target container, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to inject chaos: %s", err.Error())}
}
return nil
}
func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
@ -45,23 +54,29 @@ func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails,
}
log.Infof("Target pods list for chaos, %v", podNames)
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
return runChaos(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails)
}
func runChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
return runChaos(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails)
}
func runChaos(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
for _, pod := range targetPodList.Items {
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
@ -99,13 +114,17 @@ func runChaos(experimentsDetails *experimentTypes.ExperimentDetails, targetPodLi
}
}
if err := killChaos(experimentsDetails, pod.Name, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not revert chaos")
}
}
return nil
}
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//PrepareChaos contains the preparation steps before chaos injection
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectChaos")
// defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -113,8 +132,8 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the CPU stress experiment
if err := experimentExecution(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails);err != nil {
return err
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails);err != nil {
return stacktrace.Propagate(err, "could not execute experiment")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
@ -133,7 +152,7 @@ func killChaos(experimentsDetails *experimentTypes.ExperimentDetails, podName st
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, experimentsDetails.AppNS)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return errors.Errorf("unable to kill the process in %v pod, err: %v", podName, err)
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to revert chaos: %s", err.Error())}
}
return nil
}

View File

@ -0,0 +1,173 @@
package lib
import (
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/palantir/stacktrace"
"context"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/types"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
// Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
return runChaos(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails)
}
func runChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
runID := stringutils.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"Target Container": experimentsDetails.TargetContainer,
})
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
}
//PrepareChaos contains the preparation steps before chaos injection
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "Prepare[name-your-chaos]Fault")
// defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the CPU stress experiment
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails);err != nil {
return stacktrace.Propagate(err, "could not execute chaos")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "Create[name-your-chaos]FaultHelperPod")
// defer span.End()
helperPod := &corev1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: corev1.PodSpec{
RestartPolicy: corev1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
NodeName: appNodeName,
Containers: []corev1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: corev1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
"-c",
},
Args: []string{
"echo This is a sample pod",
"sleep 10",
},
Resources: chaosDetails.Resources,
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}

View File

@ -0,0 +1,231 @@
package lib
import (
"context"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/types"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
)
var (
err error
inject, abort chan os.Signal
)
//PrepareChaos contains the preparation and injection steps for the experiment
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "Prepare[name-your-chaos]Fault")
// defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// @TODO: user FILTER THE TARGETS
// FILTER OUT THE TARGET SERVICES EITHER BY ID OR BY TAGS
// THIS TEMPLATE CONTAINS THE SELECTION BY ID FOR TAG YOU NEED TO ADD/CALL A FUNCTION HERE
targetIDList := strings.Split(experimentsDetails.TargetID, ",")
if experimentsDetails.TargetID == "" {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no target id found"}
}
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, targetIDList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
//injectChaosInSerialMode will inject the chaos on the target one after other
func injectChaosInSerialMode(ctx context.Contxt, experimentsDetails *experimentTypes.ExperimentDetails, targetIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "Inject[name-your-chaos]FaultInSerialMode")
// defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target ID list, %v", targetIDList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i, id := range targetIDList {
// @TODO: user CHAOS-INJECTION-LOGIC
// PLACE YOUR CHAOS-INJECTION-LOGIC HERE BASED ON YOUR HYPOTHESIS
// FOR EXAMPLE TO PERFORM INSTANCE STOP CALL STOP API HERE
// @TODO: REPLACE THE TARGET WITH THE SERVICE UNDER CHAOS
common.SetTargets(id, "injected", "TARGET", chaosDetails)
log.Infof("[Wait]: Wait for chaos to be injected completely on: '%v'", id)
// @TODO: user WAIT-FOR-CHAOS-INJECTION
// WAIT UNTIL THE CHAOS IS INJECTED COMPLETLY
// The OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
// @TODO: user REVERT-CHAOS TO NORMAL STATE
// ADD THE LOGIC TO REMOVE THE CHAOS AND GET THE SERVICE IN HEALTHY STATE
// @TODO: user WITH-FOR-CHAOS-REVERT
// WAIT UNTIL THE CHAOS IS COMPLETLY REMOVED AND SERVICE IS AGAIN HEALTHY
// @TODO: REPLACE THE TARGET WITH THE SERVICE UNDER CHAOS
common.SetTargets(id, "reverted", "TARGET", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode will inject the chaos on the target all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "Inject[name-your-chaos]FaultInParallelMode")
// defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target ID list, %v", targetIDList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for _, id := range targetIDList {
// @TODO: user CHAOS-INJECTION-LOGIC
// PLACE YOUR CHAOS-INJECTION-LOGIC HERE BASED ON YOUR HYPOTHESIS
// FOR EXAMPLE TO PERFORM INSTANCE STOP CALL STOP API HERE
// @TODO: REPLACE THE TARGET WITH THE SERVICE UNDER CHAOS
common.SetTargets(id, "injected", "TARGET", chaosDetails)
}
for _, id := range targetIDList {
// @TODO: user WAIT-FOR-CHAOS-INJECTION
// WAIT UNTIL THE CHAOS IS INJECTED COMPLETLY
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
// @TODO: user REVERT-CHAOS TO NORMAL STATE
// ADD THE LOGIC TO REMOVE THE CHAOS AND GET THE SERVICE IN HEALTHY STATE
// @TODO: user WITH-FOR-CHAOS-REVERT
// WAIT UNTIL THE CHAOS IS COMPLETLY REMOVED AND SERVICE IS AGAIN HEALTHY
for _, id := range targetIDList {
// @TODO: REPLACE THE TARGET WITH THE SERVICE UNDER CHAOS
common.SetTargets(id, "reverted", "TARGET", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// watching for the abort signal and revert the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, targetIDList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for _, id := range targetIDList {
// @TODO: user REVERT-CHAOS TO NORMAL STATE
// ADD THE LOGIC TO REMOVE THE CHAOS AND GET THE SERVICE IN HEALTHY STATE
// @TODO: REPLACE THE TARGET WITH THE SERVICE UNDER CHAOS
common.SetTargets(id, "reverted", "TARGET", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -0,0 +1,32 @@
package environment
import (
"strconv"
clientTypes "k8s.io/apimachinery/pkg/types"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
// STEPS TO GETENV OF YOUR CHOICE HERE
// ADDED FOR FEW MANDATORY FIELD
//GetENV fetches all the env variables from the runner pod
func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.TargetID = types.Getenv("TARGET_ID", "")
experimentDetails.Region = types.Getenv("REGION", "")
experimentDetails.Delay, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_DELAY", "2"))
experimentDetails.Timeout, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_TIMEOUT", "180"))
experimentDetails.ManagedNodegroup = types.Getenv("MANAGED_NODEGROUP", "disable")
experimentDetails.Sequence = types.Getenv("SEQUENCE", "parallel")
}

View File

@ -0,0 +1,32 @@
package environment
import (
"strconv"
clientTypes "k8s.io/apimachinery/pkg/types"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
// STEPS TO GETENV OF YOUR CHOICE HERE
// ADDED FOR FEW MANDATORY FIELD
//GetENV fetches all the env variables from the runner pod
func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.TargetID = types.Getenv("TARGET_ID", "")
experimentDetails.Delay, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_DELAY", "2"))
experimentDetails.Timeout, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_TIMEOUT", "180"))
experimentDetails.ResourceGroup = types.Getenv("RESOURCE_GROUP", "")
experimentDetails.ScaleSet = types.Getenv("SCALE_SET", "disable")
experimentDetails.Sequence = types.Getenv("SEQUENCE", "parallel")
}

View File

@ -20,7 +20,6 @@ func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosLib = types.Getenv("LIB", "litmus")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.AppLabel = types.Getenv("APP_LABEL", "")
experimentDetails.AppKind = types.Getenv("APP_KIND", "")

View File

@ -0,0 +1,35 @@
package environment
import (
"strconv"
clientTypes "k8s.io/apimachinery/pkg/types"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
// STEPS TO GETENV OF YOUR CHOICE HERE
// ADDED FOR FEW MANDATORY FIELD
//GetENV fetches all the env variables from the runner pod
func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.TargetID = types.Getenv("TARGET_ID", "")
experimentDetails.Delay, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_DELAY", "2"))
experimentDetails.Timeout, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_TIMEOUT", "180"))
experimentDetails.GCPProjectID = types.Getenv("GCP_PROJECT_ID", "")
experimentDetails.InstanceZone = types.Getenv("INSTANCE_ZONES", "")
experimentDetails.ManagedInstanceGroup = types.Getenv("MANAGED_INSTANCE_GROUP", "disable")
experimentDetails.Sequence = types.Getenv("SEQUENCE", "parallel")
experimentDetails.InstanceLabel = types.Getenv("INSTANCE_LABEL", "")
experimentDetails.InstanceAffectedPerc, _ = strconv.Atoi(types.Getenv("INSTANCE_AFFECTED_PERC", "0"))
}

View File

@ -0,0 +1,39 @@
package environment
import (
"strconv"
clientTypes "k8s.io/apimachinery/pkg/types"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
// STEPS TO GETENV OF YOUR CHOICE HERE
// ADDED FOR FEW MANDATORY FIELD
//GetENV fetches all the env variables from the runner pod
func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.AppLabel = types.Getenv("APP_LABEL", "")
experimentDetails.AppKind = types.Getenv("APP_KIND", "")
experimentDetails.AuxiliaryAppInfo = types.Getenv("AUXILIARY_APPINFO", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.Delay, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_DELAY", "2"))
experimentDetails.Timeout, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_TIMEOUT", "180"))
experimentDetails.TargetContainer = types.Getenv("TARGET_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("TARGET_PODS", "")
experimentDetails.PodsAffectedPerc, _ = strconv.Atoi(types.Getenv("PODS_AFFECTED_PERC", "0"))
experimentDetails.LIBImagePullPolicy = types.Getenv("LIB_IMAGE_PULL_POLICY", "Always")
experimentDetails.LIBImage = types.Getenv("LIB_IMAGE", "litmuschaos/go-runner:latest")
experimentDetails.SetHelperData = types.Getenv("SET_HELPER_DATA", "true")
experimentDetails.ChaosServiceAccount = types.Getenv("CHAOS_SERVICE_ACCOUNT", "")
}

View File

@ -0,0 +1,33 @@
package environment
import (
"strconv"
clientTypes "k8s.io/apimachinery/pkg/types"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
// STEPS TO GETENV OF YOUR CHOICE HERE
// ADDED FOR FEW MANDATORY FIELD
//GetENV fetches all the env variables from the runner pod
func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.TargetID = types.Getenv("TARGET_ID", "")
experimentDetails.VcenterServer = types.Getenv("VCENTERSERVER", "")
experimentDetails.VcenterUser = types.Getenv("VCENTERUSER", "")
experimentDetails.VcenterPass = types.Getenv("VCENTERPASS", "")
experimentDetails.Delay, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_DELAY", "2"))
experimentDetails.Timeout, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_TIMEOUT", "180"))
experimentDetails.Sequence = types.Getenv("SEQUENCE", "parallel")
}

View File

@ -0,0 +1,182 @@
package experiment
import (
"context"
"os"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/{{ .Name }}/lib"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
aws "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
experimentEnv "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/environment"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/types"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/sirupsen/logrus"
)
// Experiment contains steps to inject chaos
func Experiment(ctx context.Context, clients clients.ClientSets){
experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
//Fetching all the ENV passed from the runner pod
log.Infof("[PreReq]: Getting the ENV for the %v experiment", os.Getenv("EXPERIMENT_NAME"))
experimentEnv.GetENV(&experimentsDetails)
// Initialize the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
// Initialize Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}
}
//Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
// generating the event in chaosresult to marked the verdict as awaited
msg := "experiment: " + experimentsDetails.ExperimentName + ", Result: Awaited"
types.SetResultEventAttributes(&eventsDetails, types.AwaitedVerdict, msg, "Normal", &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")
//DISPLAY THE TARGET INFORMATION
log.InfoWithValues("[Info]: The ec2 instance information is as follows", logrus.Fields{
"TargetID": experimentsDetails.TargetID,
"Region": experimentsDetails.Region,
"Chaos Duration": experimentsDetails.ChaosDuration,
})
// Calling AbortWatcher go routine, it will continuously watch for the abort signal and generate the required events and result
go common.AbortWatcher(experimentsDetails.ExperimentName, clients, &resultDetails, &chaosDetails, &eventsDetails)
// @TODO: user PRE-CHAOS-CHECK
// ADD A PRE-CHAOS CHECK OF YOUR CHOICE HERE
// POD STATUS CHECKS FOR THE APPLICATION UNDER TEST AND AUXILIARY APPLICATIONS ARE ADDED BY DEFAULT
//Verify the aws ec2 instance is running (pre chaos)
log.Info("[Status]: Verify that the aws ec2 instances are in running state (pre-chaos)")
if err := aws.InstanceStatusCheckByID(experimentsDetails.TargetID, experimentsDetails.Region); err != nil {
log.Errorf("failed to get the ec2 instance status, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Status]: EC2 instance is in running state")
if experimentsDetails.EngineName != "" {
// marking AUT as running, as we already checked the status of application under test
msg := "AUT: Running"
// run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
log.Errorf("Probe Failed, err: %v", err)
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
}
// generating the events for the pre-chaos check
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
}
// INVOKE THE CHAOSLIB OF YOUR CHOICE HERE, WHICH WILL CONTAIN
// THE BUSINESS LOGIC OF THE ACTUAL CHAOS
// IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE
// @TODO: user INVOKE-CHAOSLIB
chaosDetails.Phase = types.ChaosInjectPhase
if err := litmusLIB.PrepareChaos(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
// @TODO: user POST-CHAOS-CHECK
// ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE
// POD STATUS CHECKS FOR THE APPLICATION UNDER TEST AND AUXILIARY APPLICATIONS ARE ADDED BY DEFAULT
//Verify the aws ec2 instance is running (post chaos)
if experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Status]: Verify that the aws ec2 instances are in running state (post-chaos)")
if err := aws.InstanceStatusCheckByID(experimentsDetails.TargetID, experimentsDetails.Region); err != nil {
log.Errorf("failed to get the ec2 instance status, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Status]: EC2 instance is in running state (post chaos)")
}
if experimentsDetails.EngineName != "" {
// marking AUT as running, as we already checked the status of application under test
msg := "AUT: Running"
// run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
log.Errorf("Probes Failed, err: %v", err)
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
}
// generating post chaos event
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
}
//Updating the chaosResult in the end of experiment
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// generating the event in chaosresult to mark the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason, eventType := types.GetChaosResultVerdictEvent(resultDetails.Verdict)
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")
if experimentsDetails.EngineName != "" {
msg := experimentsDetails.ExperimentName + " experiment has been " + string(resultDetails.Verdict) + "ed"
types.SetEngineEventAttributes(&eventsDetails, types.Summary, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
}
}

View File

@ -0,0 +1,188 @@
package experiment
import (
"context"
"os"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/{{ .Name }}/lib"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
experimentEnv "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/environment"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/types"
azureCommon "github.com/litmuschaos/litmus-go/pkg/cloud/azure/common"
azureStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/instance"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/sirupsen/logrus"
)
// Experiment contains steps to inject chaos
func Experiment(ctx context.Context, clients clients.ClientSets){
experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
var err error
//Fetching all the ENV passed from the runner pod
log.Infof("[PreReq]: Getting the ENV for the %v experiment", os.Getenv("EXPERIMENT_NAME"))
experimentEnv.GetENV(&experimentsDetails)
// Initialize the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
// Initialize Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}
}
//Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
// generating the event in chaosresult to marked the verdict as awaited
msg := "experiment: " + experimentsDetails.ExperimentName + ", Result: Awaited"
types.SetResultEventAttributes(&eventsDetails, types.AwaitedVerdict, msg, "Normal", &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")
//DISPLAY THE TARGET INFORMATION
log.InfoWithValues("The instance information is as follows", logrus.Fields{
"Chaos Duration": experimentsDetails.ChaosDuration,
"Resource Group": experimentsDetails.ResourceGroup,
"Instance Name": experimentsDetails.TargetID,
"Sequence": experimentsDetails.Sequence,
})
// Calling AbortWatcher go routine, it will continuously watch for the abort signal and generate the required events and result
go common.AbortWatcher(experimentsDetails.ExperimentName, clients, &resultDetails, &chaosDetails, &eventsDetails)
// Setting up Azure Subscription ID
if experimentsDetails.SubscriptionID, err = azureCommon.GetSubscriptionID(); err != nil {
log.Errorf("fail to get the subscription id, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// @TODO: user PRE-CHAOS-CHECK
// ADD A PRE-CHAOS CHECK OF YOUR CHOICE HERE
// POD STATUS CHECKS FOR THE APPLICATION UNDER TEST AND AUXILIARY APPLICATIONS ARE ADDED BY DEFAULT
//Verify the azure target instance is running (pre-chaos)
if err := azureStatus.InstanceStatusCheckByName(experimentsDetails.TargetID, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup); err != nil {
log.Errorf("failed to get the azure instance status, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Status]: Azure instance(s) is in running state (pre-chaos)")
if experimentsDetails.EngineName != "" {
// marking AUT as running, as we already checked the status of application under test
msg := "AUT: Running"
// run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
log.Errorf("Probe Failed, err: %v", err)
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
}
// generating the events for the pre-chaos check
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
}
// INVOKE THE CHAOSLIB OF YOUR CHOICE HERE, WHICH WILL CONTAIN
// THE BUSINESS LOGIC OF THE ACTUAL CHAOS
// IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE
// @TODO: user INVOKE-CHAOSLIB
chaosDetails.Phase = types.ChaosInjectPhase
if err := litmusLIB.PrepareChaos(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
// @TODO: user POST-CHAOS-CHECK
// ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE
// POD STATUS CHECKS FOR THE APPLICATION UNDER TEST AND AUXILIARY APPLICATIONS ARE ADDED BY DEFAULT
//Verify the azure instance is running (post chaos)
if err := azureStatus.InstanceStatusCheckByName(experimentsDetails.TargetID, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup); err != nil {
log.Errorf("failed to get the azure instance status, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Status]: Azure instance is in running state (post chaos)")
if experimentsDetails.EngineName != "" {
// marking AUT as running, as we already checked the status of application under test
msg := "AUT: Running"
// run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
log.Errorf("Probes Failed, err: %v", err)
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
}
// generating post chaos event
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
}
//Updating the chaosResult in the end of experiment
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// generating the event in chaosresult to mark the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason, eventType := types.GetChaosResultVerdictEvent(resultDetails.Verdict)
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")
if experimentsDetails.EngineName != "" {
msg := experimentsDetails.ExperimentName + " experiment has been " + string(resultDetails.Verdict) + "ed"
types.SetEngineEventAttributes(&eventsDetails, types.Summary, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
}
}

View File

@ -43,9 +43,6 @@ spec:
- name: CHAOS_INTERVAL
value: ''
- name: LIB
value: ''
- name: RAMP_TIME
value: ''

View File

@ -0,0 +1,189 @@
package experiment
import (
"context"
"os"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/{{ .Name }}/lib"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
experimentEnv "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/environment"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/types"
"github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/sirupsen/logrus"
)
// Experiment contains steps to inject chaos
func Experiment(clients clients.ClientSets){
experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
//Fetching all the ENV passed from the runner pod
log.Infof("[PreReq]: Getting the ENV for the %v experiment", os.Getenv("EXPERIMENT_NAME"))
experimentEnv.GetENV(&experimentsDetails)
// Initialize the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
// Initialize Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}
}
//Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
// generating the event in chaosresult to marked the verdict as awaited
msg := "experiment: " + experimentsDetails.ExperimentName + ", Result: Awaited"
types.SetResultEventAttributes(&eventsDetails, types.AwaitedVerdict, msg, "Normal", &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")
//DISPLAY THE TARGET INFORMATION
log.InfoWithValues("The vm instance information is as follows", logrus.Fields{
"Instance Names": experimentsDetails.TargetID,
"Zones": experimentsDetails.InstanceZone,
"Sequence": experimentsDetails.Sequence,
})
// Calling AbortWatcher go routine, it will continuously watch for the abort signal and generate the required events and result
go common.AbortWatcher(experimentsDetails.ExperimentName, clients, &resultDetails, &chaosDetails, &eventsDetails)
// Create a compute service to access the compute engine resources
computeService, err := gcp.GetGCPComputeService()
if err != nil {
log.Errorf("failed to obtain a gcp compute service, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// Verify that the GCP VM instance(s) is in RUNNING state (pre-chaos)
if err := gcp.InstanceStatusCheckByName(computeService, experimentsDetails.ManagedInstanceGroup, experimentsDetails.Delay, experimentsDetails.Timeout, "pre-chaos", experimentsDetails.TargetID, experimentsDetails.GCPProjectID, experimentsDetails.InstanceZone); err != nil {
log.Errorf("failed to get the vm instance status, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Status]: VM instance is in running state (pre-chaos)")
// @TODO: user PRE-CHAOS-CHECK
// ADD A PRE-CHAOS CHECK OF YOUR CHOICE HERE
// POD STATUS CHECKS FOR THE APPLICATION UNDER TEST AND AUXILIARY APPLICATIONS ARE ADDED BY DEFAULT
if experimentsDetails.EngineName != "" {
// marking AUT as running, as we already checked the status of application under test
msg := "AUT: Running"
// run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
log.Errorf("Probe Failed, err: %v", err)
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
}
// generating the events for the pre-chaos check
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
}
// INVOKE THE CHAOSLIB OF YOUR CHOICE HERE, WHICH WILL CONTAIN
// THE BUSINESS LOGIC OF THE ACTUAL CHAOS
// IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE
// @TODO: user INVOKE-CHAOSLIB
chaosDetails.Phase = types.ChaosInjectPhase
if err := litmusLIB.PrepareChaos(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
// @TODO: user POST-CHAOS-CHECK
// ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE
// POD STATUS CHECKS FOR THE APPLICATION UNDER TEST AND AUXILIARY APPLICATIONS ARE ADDED BY DEFAULT
//Verify the GCP VM instance is in RUNNING status (post-chaos)
if err := gcp.InstanceStatusCheckByName(computeService, experimentsDetails.ManagedInstanceGroup, experimentsDetails.Delay, experimentsDetails.Timeout, "post-chaos", experimentsDetails.TargetID, experimentsDetails.GCPProjectID, experimentsDetails.InstanceZone); err != nil {
log.Errorf("failed to get the vm instance status, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Status]: VM instance is in running state (post-chaos)")
if experimentsDetails.EngineName != "" {
// marking AUT as running, as we already checked the status of application under test
msg := "AUT: Running"
// run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
log.Errorf("Probes Failed, err: %v", err)
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
}
// generating post chaos event
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
}
//Updating the chaosResult in the end of experiment
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// generating the event in chaosresult to mark the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason, eventType := types.GetChaosResultVerdictEvent(resultDetails.Verdict)
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")
if experimentsDetails.EngineName != "" {
msg := experimentsDetails.ExperimentName + " experiment has been " + string(resultDetails.Verdict) + "ed"
types.SetEngineEventAttributes(&eventsDetails, types.Summary, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
}
}

View File

@ -1,7 +1,10 @@
package experiment
import (
"context"
"os"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/{{ .Name }}/lib"
"github.com/litmuschaos/litmus-go/pkg/events"
@ -16,9 +19,8 @@ import (
"github.com/sirupsen/logrus"
)
// Experiment contains steps to inject chaos
func Experiment(clients clients.ClientSets){
func Experiment(ctx context.Context, clients clients.ClientSets){
experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{}
@ -36,19 +38,18 @@ func Experiment(clients clients.ClientSets){
types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" {
// Initialize the probe details. Bail out upon error, as we haven't entered exp business logic yet
if err := probe.InitializeProbesInChaosResultDetails(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}
}
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}
}
//Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err)
failStep := "[pre-chaos]: Failed to update the chaos result of pod-delete experiment (SOT), err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
@ -73,27 +74,27 @@ func Experiment(clients clients.ClientSets){
// @TODO: user PRE-CHAOS-CHECK
// ADD A PRE-CHAOS CHECK OF YOUR CHOICE HERE
// POD STATUS CHECKS FOR THE APPLICATION UNDER TEST AND AUXILIARY APPLICATIONS ARE ADDED BY DEFAULT
//PRE-CHAOS APPLICATION STATUS CHECK
log.Info("[Status]: Verify that the AUT (Application Under Test) is running (pre-chaos)")
if err := status.AUTStatusCheck(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.TargetContainer, experimentsDetails.Timeout, experimentsDetails.Delay, clients, &chaosDetails); err != nil {
log.Errorf("Application status check failed, err: %v", err)
failStep := "[pre-chaos]: Failed to verify that the AUT (Application Under Test) is in running state, err: " + err.Error()
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, "AUT: Not Running", "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
if chaosDetails.DefaultHealthCheck {
log.Info("[Status]: Verify that the AUT (Application Under Test) is running (pre-chaos)")
if err := status.AUTStatusCheck(clients, &chaosDetails); err != nil {
log.Errorf("Application status check failed, err: %v", err)
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, "AUT: Not Running", "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
}
{{ if eq .AuxiliaryAppCheck true }}
//PRE-CHAOS AUXILIARY APPLICATION STATUS CHECK
if experimentsDetails.AuxiliaryAppInfo != "" {
log.Info("[Status]: Verify that the Auxiliary Applications are running (pre-chaos)")
if err := status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients);err != nil {
log.Errorf("Auxiliary Application status check failed, err: %v", err)
failStep := "[pre-chaos]: Failed to verify that the Auxiliary Applications are in running state, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
log.Info("[Status]: Verify that the Auxiliary Applications are running (pre-chaos)")
if err := status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
log.Errorf("Auxiliary Application status check failed, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
}
{{- end }}
@ -104,13 +105,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
log.Errorf("Probe Failed, err: %v", err)
failStep := "[pre-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
@ -125,44 +125,39 @@ func Experiment(clients clients.ClientSets){
// IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE
// @TODO: user INVOKE-CHAOSLIB
// Including the litmus lib
switch experimentsDetails.ChaosLib {
case "litmus":
if err := litmusLIB.PrepareChaos(&experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
default:
log.Error("[Invalid]: Please Provide the correct LIB")
failStep := "[chaos]: no match found for specified lib"
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
chaosDetails.Phase = types.ChaosInjectPhase
if err := litmusLIB.PrepareChaos(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
// @TODO: user POST-CHAOS-CHECK
// ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE
// POD STATUS CHECKS FOR THE APPLICATION UNDER TEST AND AUXILIARY APPLICATIONS ARE ADDED BY DEFAULT
//POST-CHAOS APPLICATION STATUS CHECK
log.Info("[Status]: Verify that the AUT (Application Under Test) is running (post-chaos)")
if err := status.AUTStatusCheck(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.TargetContainer, experimentsDetails.Timeout, experimentsDetails.Delay, clients, &chaosDetails); err != nil {
log.Errorf("Application status check failed, err: %v", err)
failStep := "[post-chaos]: Failed to verify that the AUT (Application Under Test) is running, err: " + err.Error()
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, "AUT: Not Running", "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
if chaosDetails.DefaultHealthCheck {
log.Info("[Status]: Verify that the AUT (Application Under Test) is running (post-chaos)")
if err := status.AUTStatusCheck(clients, &chaosDetails); err != nil {
log.Errorf("Application status check failed, err: %v", err)
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, "AUT: Not Running", "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
}
{{ if eq .AuxiliaryAppCheck true }}
//POST-CHAOS AUXILIARY APPLICATION STATUS CHECK
if experimentsDetails.AuxiliaryAppInfo != "" {
log.Info("[Status]: Verify that the Auxiliary Applications are running (post-chaos)")
if err := status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients);err != nil {
if err := status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
log.Errorf("Auxiliary Application status check failed, err: %v", err)
failStep := "[post-chaos]: Failed to verify that the Auxiliary Applications are running, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
}
@ -174,13 +169,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
log.Errorf("Probes Failed, err: %v", err)
failStep := "[post-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
@ -196,17 +190,13 @@ func Experiment(clients clients.ClientSets){
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// generating the event in chaosresult to marked the verdict as pass/fail
// generating the event in chaosresult to mark the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason := types.PassVerdict
eventType := "Normal"
if resultDetails.Verdict != "Pass" {
reason = types.FailVerdict
eventType = "Warning"
}
reason, eventType := types.GetChaosResultVerdictEvent(resultDetails.Verdict)
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")
@ -215,4 +205,4 @@ func Experiment(clients clients.ClientSets){
types.SetEngineEventAttributes(&eventsDetails, types.Summary, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
}
}
}

View File

@ -45,11 +45,6 @@ spec:
- name: RAMP_TIME
value: ''
## env var that describes the library used to execute the chaos
## default: litmus. Supported values: litmus, powerfulseal, chaoskube
- name: LIB
value: ''
# provide the chaos namespace
- name: CHAOS_NAMESPACE
value: ''

View File

@ -0,0 +1,194 @@
package experiment
import (
"context"
"os"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/{{ .Name }}/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/vmware"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
experimentEnv "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/environment"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/types"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/sirupsen/logrus"
)
// Experiment contains steps to inject chaos
func Experiment(ctx context.Context, clients clients.ClientSets){
experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
//Fetching all the ENV passed from the runner pod
log.Infof("[PreReq]: Getting the ENV for the %v experiment", os.Getenv("EXPERIMENT_NAME"))
experimentEnv.GetENV(&experimentsDetails)
// Initialize the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
// Initialize Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" {
// Initialize the probe details. Bail out upon error, as we haven't entered exp business logic yet
if err := probe.InitializeProbesInChaosResultDetails(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}
}
//Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
// generating the event in chaosresult to marked the verdict as awaited
msg := "experiment: " + experimentsDetails.ExperimentName + ", Result: Awaited"
types.SetResultEventAttributes(&eventsDetails, types.AwaitedVerdict, msg, "Normal", &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")
//DISPLAY THE TARGET INFORMATION
log.InfoWithValues("[Info]: The Instance information is as follows", logrus.Fields{
"VM MOIDS": experimentsDetails.TargetID,
"Ramp Time": experimentsDetails.RampTime,
"Chaos Duration": experimentsDetails.ChaosDuration,
})
// Calling AbortWatcher go routine, it will continuously watch for the abort signal and generate the required events and result
go common.AbortWatcher(experimentsDetails.ExperimentName, clients, &resultDetails, &chaosDetails, &eventsDetails)
// GET SESSION ID TO LOGIN TO VCENTER
cookie, err := vmware.GetVcenterSessionID(experimentsDetails.VcenterServer, experimentsDetails.VcenterUser, experimentsDetails.VcenterPass)
if err != nil {
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
log.Errorf("Vcenter Login failed, err: %v", err)
return
}
// @TODO: user PRE-CHAOS-CHECK
// ADD A PRE-CHAOS CHECK OF YOUR CHOICE HERE
// POD STATUS CHECKS FOR THE APPLICATION UNDER TEST AND AUXILIARY APPLICATIONS ARE ADDED BY DEFAULT
// PRE-CHAOS VM STATUS CHECK
if err := vmware.VMStatusCheck(experimentsDetails.VcenterServer, experimentsDetails.TargetID, cookie); err != nil {
log.Errorf("Failed to get the VM status, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Verification]: VMs are in running state (pre-chaos)")
if experimentsDetails.EngineName != "" {
// marking AUT as running, as we already checked the status of application under test
msg := "AUT: Running"
// run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
log.Errorf("Probe Failed, err: %v", err)
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
}
// generating the events for the pre-chaos check
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
}
// INVOKE THE CHAOSLIB OF YOUR CHOICE HERE, WHICH WILL CONTAIN
// THE BUSINESS LOGIC OF THE ACTUAL CHAOS
// IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE
// @TODO: user INVOKE-CHAOSLIB
chaosDetails.Phase = types.ChaosInjectPhase
if err := litmusLIB.PrepareChaos(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
// @TODO: user POST-CHAOS-CHECK
// ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE
// POD STATUS CHECKS FOR THE APPLICATION UNDER TEST AND AUXILIARY APPLICATIONS ARE ADDED BY DEFAULT
//POST-CHAOS VM STATUS CHECK
log.Info("[Status]: Verify that the IUT (Instance Under Test) is running (post-chaos)")
if err := vmware.VMStatusCheck(experimentsDetails.VcenterServer, experimentsDetails.TargetID, cookie); err != nil {
log.Errorf("Failed to get the VM status, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Verification]: VMs are in running state (post-chaos)")
if experimentsDetails.EngineName != "" {
// marking AUT as running, as we already checked the status of application under test
msg := "AUT: Running"
// run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
log.Errorf("Probes Failed, err: %v", err)
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
}
// generating post chaos event
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
}
//Updating the chaosResult in the end of experiment
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// generating the event in chaosresult to marked the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason := types.PassVerdict
eventType := "Normal"
if resultDetails.Verdict != "Pass" {
reason = types.FailVerdict
eventType = "Warning"
}
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")
if experimentsDetails.EngineName != "" {
msg := experimentsDetails.ExperimentName + " experiment has been " + string(resultDetails.Verdict) + "ed"
types.SetEngineEventAttributes(&eventsDetails, types.Summary, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
}
}

View File

@ -0,0 +1,28 @@
package types
import (
clientTypes "k8s.io/apimachinery/pkg/types"
)
// ADD THE ATTRIBUTES OF YOUR CHOICE HERE
// FEW MANDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// ExperimentDetails is for collecting all the experiment-related details
type ExperimentDetails struct {
ExperimentName string
EngineName string
ChaosDuration int
ChaosInterval int
RampTime int
ChaosUID clientTypes.UID
InstanceID string
ChaosNamespace string
ChaosPodName string
Timeout int
Delay int
ChaosServiceAccount string
TargetID string
Region string
ManagedNodegroup string
Sequence string
}

View File

@ -0,0 +1,28 @@
package types
import (
clientTypes "k8s.io/apimachinery/pkg/types"
)
// ADD THE ATTRIBUTES OF YOUR CHOICE HERE
// FEW MANDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// ExperimentDetails is for collecting all the experiment-related details
type ExperimentDetails struct {
ExperimentName string
EngineName string
ChaosDuration int
ChaosInterval int
RampTime int
ChaosUID clientTypes.UID
InstanceID string
ChaosNamespace string
ChaosPodName string
Timeout int
Delay int
TargetID string
ResourceGroup string
SubscriptionID string
ScaleSet string
Sequence string
}

Some files were not shown because too many files have changed in this diff Show More