Compare commits

...

304 Commits

Author SHA1 Message Date
Neelanjan Manna e7b4e7dbe4
chore: adds retries with timeout for litmus and k8s client operations (#766)
* chore: adds retries for k8s api operations

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

* chore: adds retries for litmus api operations

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

---------

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2025-08-14 15:41:34 +05:30
Neelanjan Manna 62a4986c78
chore: adds common functions for helper pod lifecycle management (#764)
Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2025-08-14 12:18:29 +05:30
Neelanjan Manna d626cf3ec4
Merge pull request #761 from litmuschaos/CHAOS-9404
feat: adds port filtering for ip/hostnames for network faults, adds pod-network-rate-limit fault
2025-08-13 16:40:51 +05:30
neelanjan00 59125424c3
feat: adds ip+port filtering, adds pod-network-rate-limit fault
Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2025-08-13 16:13:24 +05:30
Neelanjan Manna 2e7ff836fc
feat: Adds multi container support for pod stress faults (#757)
* chore: Fix typo in log statement

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

* chore: adds multi-container stress chaos system with improved lifecycle management and better error handling

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

---------

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-08-13 16:04:20 +05:30
Prexy e61d5b33be
written test for `workload.go` in `pkg/workloads` (#767)
* written test for workload.go in pkg/workloads

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* checking go formatting

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
2025-08-12 17:30:22 +05:30
Prexy 14fe30c956
test: add unit tests for exec.go file in pkg/utils folder (#755)
* test: add unit tests for exec.go file in pkg/utils folder

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* fixing gofmt

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* creating table driven test and also updates TestCheckPodStatus

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-07-24 15:33:25 +05:30
Prexy 4ae08899e0
test: add unit tests for retry.go in pkg/utils folder (#754)
* test: add unit tests for retry.go in pkg/utils folder

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* fixing gofmt

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
2025-07-24 11:55:42 +05:30
Prexy 2c38220cca
test: add unit tests for RandStringBytesMask and GetRunID in stringutils (#753)
* test: add unit tests for RandStringBytesMask and GetRunID in stringutils

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* fixing gofmt

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
2025-07-24 11:55:26 +05:30
Sami S. 07de11eeee
Fix: handle pagination in ssm describeInstanceInformation & API Rate Limit (#738)
* Fix: handle pagination in ssm describe

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* implement exponential backoff with jitter for API rate limiting

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Refactor

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Update pkg/cloud/aws/ssm/ssm-operations.go

Co-authored-by: Neelanjan Manna <neelanjanmanna@gmail.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fixup

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Update pkg/cloud/aws/ssm/ssm-operations.go

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Fix: include error message from stderr if container-kill fails (#740) (#741)

Signed-off-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fix(logs): Fix the error logs for container-kill fault (#745)

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fix(container-kill): Fixed the container stop command timeout issue (#747)

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* feat: Add a rds-instance-stop chaos fault (#710)

* feat: Add a rds-instance-stop chaos fault

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>

---------

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Update pkg/cloud/aws/ssm/ssm-operations.go

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fix go fmt ./...

Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Filter instances on api call

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fixes lint

Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>

---------

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>
Signed-off-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>
Co-authored-by: Neelanjan Manna <neelanjanmanna@gmail.com>
Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
Co-authored-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Co-authored-by: Jongwoo Han <jongwooo.han@gmail.com>
Co-authored-by: Udit Gaurav <udit.gaurav@harness.io>
2025-04-30 10:25:10 +05:30
Jongwoo Han 5c22472290
feat: Add a rds-instance-stop chaos fault (#710)
* feat: Add a rds-instance-stop chaos fault

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>

---------

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
2025-04-24 12:54:05 +05:30
Shubham Chaudhary e7b3fb6f9f
fix(container-kill): Fixed the container stop command timeout issue (#747)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-04-15 18:20:23 +05:30
Shubham Chaudhary e1eaea9110
fix(logs): Fix the error logs for container-kill fault (#745)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-04-03 15:35:00 +05:30
Björn Kylberg 491dc5e31a
Fix: include error message from stderr if container-kill fails (#740) (#741)
Signed-off-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
2025-04-03 14:44:05 +05:30
Shubham Chaudhary caae228e35
(chore): fix the go fmt of the files (#734)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-01-17 12:08:34 +05:30
kbfu 34a62d87f3
fix the cgroup 2 problem (#677)
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-01-17 11:29:30 +05:30
Suhyen Im 8246ff891b
feat: propagate trace context to helper pods (#722)
Signed-off-by: Suhyen Im <suhyenim.kor@gmail.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Co-authored-by: Saranya Jena <saranya.jena@harness.io>
2025-01-15 16:34:19 +05:30
Namkyu Park 9b29558585
feat: export k6 results output to the OTEL collector (#726)
* Export k6 results to the otel collector

Signed-off-by: namkyu1999 <lak9348@gmail.com>

* add envs for multiple projects

Signed-off-by: namkyu1999 <lak9348@gmail.com>

---------

Signed-off-by: namkyu1999 <lak9348@gmail.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Co-authored-by: Saranya Jena <saranya.jena@harness.io>
2025-01-15 16:33:43 +05:30
Sayan Mondal c7ab5a3d7c
Merge pull request #732 from heysujal/add-openssh-clients
add openssh-clients to dockerfile
2025-01-15 11:28:17 +05:30
Shubham Chaudhary 3bef3ad67e
Merge branch 'master' into add-openssh-clients 2025-01-15 10:57:02 +05:30
Sujal Gupta b2f68a6ad1
use revertErr instead of err (#730)
Signed-off-by: Sujal Gupta <sujalgupta6100@gmail.com>
2025-01-15 10:38:32 +05:30
Sujal Gupta cd2ec26083 add openssh-clients to dockerfile
Signed-off-by: Sujal Gupta <sujalgupta6100@gmail.com>
2025-01-06 01:04:25 +05:30
Shubham Chaudhary 7e08c69750
chore(stress): Fix the stress faults (#723)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-11-20 15:18:59 +05:30
Namkyu Park 3ef23b01f9
feat: implement opentelemetry for distributed tracing (#706)
* feat: add otel & tracing for distributed tracing

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* feat: add tracing codes to chaslib

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: misc

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: make otel optional

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: skip if litmus-go not received trace_parent

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: Set context.Context as a parameter in each function

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* update templates

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* feat: rename spans and enhance coverage

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: avoid shadowing

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: add logs

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: add logs

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: fix templates

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

---------

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>
2024-10-24 16:14:57 +05:30
Shubham Chaudhary 0cd6c6fae3
(chore): Fix the build, push, and release pipelines (#716)
* (chore): Fix the build, push, and release pipelines

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* (chore): Fix the dockerfile

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-10-15 23:33:54 +05:30
Shubham Chaudhary 6a386d1410
(chore): Fix the disk-fill fault (#715)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-10-15 22:15:14 +05:30
Vedant Shrotria fc646d678c
Merge pull request #707 from dusdjhyeon/ubi-migration
UBI migration of Images - go-runner
2024-08-23 11:32:44 +05:30
dusdjhyeon 6257c1abb8
feat: add build arg
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-22 16:13:18 +09:00
dusdjhyeon 755a562efe
Merge branch 'ubi-migration' of https://github.com/dusdjhyeon/litmus-go into ubi-migration 2024-08-22 16:10:37 +09:00
dusdjhyeon d0814df9ea
fix: set build args
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-22 16:09:40 +09:00
Vedant Shrotria a6012039fd
Update .github/workflows/run-e2e-on-pr-commits.yml 2024-08-22 11:19:42 +05:30
Vedant Shrotria a1f602ba98
Update .github/workflows/run-e2e-on-pr-commits.yml 2024-08-22 11:19:33 +05:30
Vedant Shrotria 7476994a36
Update .github/workflows/run-e2e-on-pr-commits.yml 2024-08-22 11:19:25 +05:30
Vedant Shrotria 3440fb84eb
Update .github/workflows/release.yml 2024-08-22 11:18:46 +05:30
Vedant Shrotria 652e6b8465
Update .github/workflows/release.yml 2024-08-22 11:18:39 +05:30
Vedant Shrotria 996f3b3f5f
Update .github/workflows/push.yml 2024-08-22 11:18:10 +05:30
Vedant Shrotria e73f3bfb21
Update .github/workflows/push.yml 2024-08-22 11:17:54 +05:30
Vedant Shrotria 054d091dce
Update .github/workflows/build.yml 2024-08-22 11:17:37 +05:30
Vedant Shrotria c362119e05
Update .github/workflows/build.yml 2024-08-22 11:17:15 +05:30
dusdjhyeon 31bf293140
fix: change go version and others
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-22 14:39:17 +09:00
Vedant Shrotria 9569c8b2f4
Merge branch 'master' into ubi-migration 2024-08-21 16:25:14 +05:30
dusdjhyeon 4f9f4e0540
fix: upgrade version for vulnerability
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:58 +09:00
dusdjhyeon 399ccd68a0
fix: change kubectl crictl latest version
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:58 +09:00
Jongwoo Han 35958eae38
Rename env to EC2_INSTANCE_TAG (#708)
Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon 003a3dc02c
fix: change docker repo
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon d4eed32a6d
fix: change version arg
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon af7322bece
fix: app_dir and yum
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon bd853f6e25
feat: migration base image
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon cfdb205ca3
fix: typos and add arg
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
Jongwoo Han f051d5ac7c
Rename env to EC2_INSTANCE_TAG (#708)
Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
2024-08-14 16:42:35 +05:30
Andrii Kotelnikov 10e9b774a8
Update workloads.go (#705)
Fix issue with empty kind field
Signed-off-by: Andrii Kotelnikov <andrusha@ukr.net>
2024-06-14 14:16:47 +05:30
Vedant Shrotria 9689f74fce
Merge pull request #701 from Jonsy13/add-gitleaks
Adding `gitleaks` as PR Check
2024-05-20 10:27:09 +05:30
Vedant Shrotria d273ba628b
Merge branch 'master' into add-gitleaks 2024-05-17 17:37:15 +05:30
Jonsy13 2315eaf2a4
Added gitleaks
Signed-off-by: Jonsy13 <vedant.shrotria@harness.io>
2024-05-17 17:34:36 +05:30
Shubham Chaudhary f2b2c2747a
chore(io-stress): Fix the pod-io-stress experiment (#700)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-05-17 16:43:19 +05:30
Udit Gaurav 66d01011bb
Fix pipeline issues (#694)
Fix pipeline issues

---------

Signed-off-by: Udit Gaurav <udit.gaurav@harness.io
2024-04-26 14:17:01 +05:30
Udit Gaurav a440615a51
Fix gofmt issues (#695) 2024-04-25 23:45:59 +05:30
Shubham Chaudhary 78eec36b79
chore(probe): Fix the probe description on failure (#692)
* chore(probe): Fix the probe description on failure

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* chore(probe): Consider http timeout as probe failure

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-04-23 18:06:48 +05:30
Michael Morris b5a24b4044
enable ALL for TARGET_CONTAINER (#683)
Signed-off-by: MichaelMorris <michael.morris@est.tech>
2024-03-14 19:44:18 +05:30
Shubham Chaudhary 6d26c21506
test: Adding fuzz testing for common util (#691)
* test: Adding fuzz testing for common util

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* fix the random interval test

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-03-12 17:02:01 +05:30
Namkyu Park 5554a29ea2
chore: fix typos (#690)
Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>
2024-03-11 20:26:50 +05:30
Sayan Mondal 5f0d882912
test: Adding fuzz testing for common util (#688) 2024-03-08 21:42:20 +05:30
Namkyu Park eef3b4021d
feat: Add a k6-loadgen chaos fault (#687)
* feat: add k6-loadgen

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>
2024-03-07 19:19:51 +05:30
smit thakkar 96f6571e77
fix: accomodate for pending pods with no IP address in network fault (#684)
Signed-off-by: smit thakkar <smit.thakkar@deliveryhero.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-03-01 15:06:07 +05:30
Nageshbansal b9f897be21
Adds support for tolerations in source cmd probe (#681)
Signed-off-by: nagesh bansal <nageshbansal59@gmail.com>
2024-03-01 14:51:55 +05:30
Michael Morris c2f8f79ab9
Fix consider appKind when filtering target pods (#680)
* Fix consider appKind when filtering target pods

Signed-off-by: MichaelMorris <michael.morris@est.tech>

* Implemted review comment

Signed-off-by: MichaelMorris <michael.morris@est.tech>

---------

Signed-off-by: MichaelMorris <michael.morris@est.tech>
2024-03-01 14:41:29 +05:30
Nageshbansal 69927489d2
Fixes Probe logging for all iterations (#676)
* Fixes Probe logging for all iterations

Signed-off-by: nagesh bansal <nageshbansal59@gmail.com>
2024-01-11 17:48:26 +05:30
Shubham Chaudhary bdddd0d803
Add port blacklisting in the pod-network faults (#673)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-10-12 19:37:56 +05:30
Shubham Chaudhary 1b75f78632
fix(action): Fix the github release action (#672)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-09-29 16:02:01 +05:30
Calvinaud b710216113
Revert chaos when error during drain for node-drain experiments (#668)
- Added an call to uncordonNode in case of an error of the drainNode function

Signed-off-by: Calvin Audier <calvin.audier@gmail.com>
2023-09-21 23:54:33 +05:30
Shubham Chaudhary 392ea29800
chore(network): fix the destination ips for network experiment for service mesh (#666)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-09-15 11:00:34 +05:30
Shubham Chaudhary db13d05e28
Add fix to remove the job labels from helper pod (#665)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-07-24 13:09:57 +05:30
Vedant Shrotria d737281985
Merge pull request #661 from Jonsy13/group-optional-litmus-go
Upgrading chaos-operator version for making group optional in k8s probe
2023-06-05 13:05:51 +05:30
Jonsy13 61751a9404
Added changes for operator upgrade
Signed-off-by: Jonsy13 <vedant.shrotria@harness.io>
2023-06-05 12:34:12 +05:30
Shubham Chaudhary d4f9826ea9
chore(fields): Updating optional fields to pointer type (#658)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-25 14:02:22 +05:30
Shubham Chaudhary 3ab28a5110
run workflow on dispatch event and use token from secrets (#657)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 01:10:08 +05:30
Shubham Chaudhary 3005d02c24
use the official snyk action (#656)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 01:01:09 +05:30
Shubham Chaudhary 1971b8093b
fix the snyk token name (#655)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 00:35:26 +05:30
Shubham Chaudhary e5a831f713
fix the github workflow (#654)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 00:29:54 +05:30
Shubham Chaudhary 95c9602019
adding security scan workflow (#653)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 00:24:53 +05:30
Shubham Chaudhary f36b0761aa
adding security scan workflow (#652)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 00:21:19 +05:30
Shubham Chaudhary d3b760d76d
chore(unit): Adding units to the duration fields (#650)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-18 13:40:10 +05:30
Shubham Chaudhary 0bbe8e23e7
Revert "probe comparator logging for all iterations (#646)" (#649)
This reverts commit 8e0bbbbd5d.
2023-04-18 01:01:48 +05:30
Neelanjan Manna 5ade71c694
chore(probe): Update Probe failure descriptions and error codes (#648)
* adds probe description changes

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2023-04-17 17:24:23 +05:30
Shubham Chaudhary 8e0bbbbd5d
probe comparator logging for all iterations (#646)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-17 11:24:47 +05:30
Shubham Chaudhary d0b36e9a50
fix(probe): ProbeSuccessPercentage should not be 100% if experiment terminated with Error (#645)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-10 15:17:51 +05:30
Shubham Chaudhary eee4421c3c
chore(sdk): Updating the sdk to latest experiment schema (#644)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-03-20 17:01:46 +05:30
Neelanjan Manna a1c85ca52c
chore(experiments): Replaces default container runtime to containerd (#640)
* replaces default container runtime to containerd

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2023-03-14 19:41:02 +05:30
Shubham Chaudhary f8b370e6f4
add the experiment phase as completed with error (#642)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-03-09 21:52:17 +05:30
Neelanjan Manna 04c031a281
updates http probe wait duration to ms (#643)
Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2023-03-08 12:46:21 +05:30
Shubham Chaudhary ea2b83e1a0
adding backend compatibility to probe retry (#639)
* adding backend compatibility to probe retry

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* updating the chaos-operator version

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-02-22 10:03:56 +05:30
Shubham Chaudhary 291ae4a6ad
chore(error-verdict): Adding experiment verdict as error (#637)
* chore(error-verdict): Adding experiment verdict as error

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* updating error verdict

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* updating the chaos-operator version

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* adding comments and changing function name

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-02-21 23:37:56 +05:30
Akash Shrivastava 8b68c4b5cb
Added filtering vm instance by tag (#635)
Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
2023-02-15 16:48:47 +05:30
Shubham Chaudhary 7bdb18016f
chore(probe): updating retries to attempt and use the timout for per attempt timeout (#636)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-02-09 17:02:31 +05:30
Shubham Chaudhary 4aa778ef9c
chore(probe-timeout): converting probe timeout in milli seconds (#634)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-02-05 01:34:39 +05:30
Shubham Chaudhary 1f02800c23
chore(parallel): add support to create unique runid for same timestamp (#633)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-01-20 11:11:12 +05:30
Shubham Chaudhary 2134933c03
fix(stderr): adding the fix for cmd.Exec considers log.info as stderr (#632)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-01-10 21:58:02 +05:30
Shubham Chaudhary d151c8f1e0
chore(sidecar): adding sidecar to the helper pod (#630)
* chore(sidecar): adding sidecar to the helper pod

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* adding support for multiple sidecars

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* chore(sidecar): adding env and envFrom fields

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-01-10 12:58:57 +05:30
Shubham Chaudhary 3622f505c9
chore(probe): Adding the root cause into probe description (#628)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-01-09 15:15:14 +05:30
Shubham Chaudhary dc9283614b
chore(sdk): adding failstep and lib changes to sdk (#627)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-12-16 00:36:10 +05:30
Shubham Chaudhary 5eed28bf3f
fix(vulrn):fixing the security vulnerabilities (#617)
* fix(vulrn): fixing the security vulnerabilities

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-12-15 17:22:13 +05:30
Shubham Chaudhary 77b30e221e
(chore): Adding user-friendly failsteps and removing non-litmus libs (#626)
* feat(failstep):  Adding failstep in all experiment and removed non-litmus libs

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-12-15 16:42:27 +05:30
Neelanjan Manna eb98d50855
fix(gcp-label-experiments): Fix label filtering logic (#593)
* fix(gcp-label-experiments): fix label filter logic

Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>
2022-11-24 19:27:46 +05:30
Akash Shrivastava 3e72bb14e9
changed dd to use nsenter (#605)
Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>

Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-24 11:02:36 +05:30
Shubham Chaudhary 115ec45339
fix(pod-delete): fixing pod-delete experiment and refactor workload utils (#610)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-22 17:29:33 +05:30
Shubham Chaudhary 0e18911da6
chore(spring-boot): add spring-boot all faults option and remove duplicate code (#609)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-21 23:39:32 +05:30
Shubham Chaudhary e1eb389edf
Adding single helper and selectors changes to master (#608)
* feat(helper): adding single helper per node


Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-21 22:58:46 +05:30
Akash Shrivastava 39bbdbbf44
assigned msg var (#606)
Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>

Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
2022-11-18 14:14:57 +05:30
Shubham Chaudhary ff285178d5
chore(spring-boot): simplifying spring boot experiments env (#604)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-18 11:34:41 +05:30
Soumya Ghosh Dastidar f16249f802
feat: add resource name filtering in k8s probe (#598)
* feat: add resource name filtering in k8s probe

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>
2022-11-14 12:49:55 +05:30
Shubham Chaudhary 21969543bf
chore(spring-boot): spliting spring-boot-chaos experiment to separate experiments (#594)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-14 11:30:41 +05:30
Shubham Chaudhary 7140565204
chore(sudo): fixing sudo command (#595)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-07 21:03:09 +05:30
Shubham Chaudhary 920c62d032
fix(dns-chaos): fixing the dns helper logs (#589)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-10-13 19:14:00 +05:30
Neelanjan Manna 7273da979a
Update google apis for GCP experiments and adds DefaultHealthCheck for GCP experiments (#580)
* updated probe default health check for GCP experiments

Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>
2022-10-12 16:30:30 +05:30
Ashis Kumar Naik 35bb75fda9
Remove Redundant Steady-State Checks in GCP VM Instance Stop experiment (#554) (#585)
*Removed the redundant sanity checks in GCP VM instance stop experiment in chaoslib which were originally also defined in the steady state check function for the experiment.

Signed-off-by: Ashis Kumar Naik <ashishami2002@gmail.com>
2022-10-12 09:45:21 +05:30
Neelanjan Manna e8ec4bd0df
fix(Experiment): Add status logs for GCP experiments (#583)
* added status logs to GCP experiments

Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>
2022-10-11 10:29:05 +05:30
Shubham Chaudhary f80413639c
feat(dns-chaos): Adding containerd support for dns-chaos (#577)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-10-05 21:56:24 +05:30
Akash Shrivastava ce0ccb5cf8
added default healthcheck condition; Removed redundant code (#579)
Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
2022-10-05 19:54:46 +05:30
Shubham Chaudhary 45b79a8916
chore(httpchaos): Adding support for serviceMesh (#578)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-10-05 14:57:19 +05:30
Udit Gaurav d4e05dbeb7
Chore(checks): Makes the default health check tunable and remove AUT and Aux checks from infra experiments (#576)
Signed-off-by: uditgaurav <udit@chaosnative.com>

Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-10-01 00:37:58 +05:30
Udit Gaurav b69ed69aab
Chore(cmd-probe): Use experiment envs and volume in probe pod (#572)
* Chore(cmd-probe): Use experiment envs and volume in probe pod

Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-09-29 18:44:00 +05:30
Shubham Chaudhary 5c09e5a36e
feat(ports): Adding source and destination ports support in network experiments (#570)
* feat(ports): Adding source and destination ports support in network experiments

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* Update chaoslib/litmus/network-chaos/helper/netem.go
2022-09-29 17:47:43 +05:30
Shubham Chaudhary 1ec871a62e
chore(httpProbe): Remove responseTimeout field, use the global probeTimeout field instead (#574)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-29 17:08:52 +05:30
Ashis Kumar Naik 50920bef44
Add missing Jitter ENV input for Pod Network Chaos experiments (#563) (#573)
*added the missing JITTER ENV for the network latency experiment to the experiment structure

*updated the default value of Network Latency to 2000 ms

Signed -off-by:  Ashis kumar Naik <ashishami2002@gmail.com>
2022-09-29 11:15:39 +05:30
Shubham Chaudhary 25d81a302a
update(sdk): updating operator sdk version (#571)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-27 16:57:26 +05:30
Saptarshi Sarkar c6a153f1fe
Updated README.md file (#559)
* Updated README.md file

Added link to the `License` file.
2022-09-26 11:11:43 +05:30
Tanmay Pandey 805af4f4bc
Fix helper pod issue for Kubelet Experiment (#543)
* Fix helper pod issue for Kubelet Experiment
Signed-off-by: Tanmay Pandey <tanmaypandey1998@gmail.com>
2022-09-26 10:46:37 +05:30
Shubham Chaudhary a83f346ea6
fix(stress): kill the stress process for abort (#569)
* fix(stress): kill the stress process for abort

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-23 16:14:51 +05:30
Shubham Chaudhary af535c19cc
fix(probe): Resiliency Score reaches more than 100 % with Probe failure (#568)
* chore(probe): remove ambiguous attribute phase

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* chore(probe): handle edge mode when probe failed in prechaos phase but passed in postchaos phase with stopOnfailure set to false

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* chore(probe): fixed the probeSuccessPercentage > 100 issue

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* add the failstep for probe failures

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* fixing onchaos probe to run only for chaos duration

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-21 11:59:43 +05:30
Chinmay Mehta 703f507461
Optimized the logic for duplicate IP check (#565)
Signed-off-by: chinmaym07 <b418020@iiit-bh.ac.in>
2022-09-21 10:37:51 +05:30
Shubham Chaudhary 84854b7851
fix(probe): Converting probeStatus as enum (#566)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-20 18:46:57 +05:30
Shubham Chaudhary f196f763a1
fix(abort): fixing chaosresult annotation conflict while updating chaosresult for abort scenarios (#567)
* fix(result): fix chaosresult update conflict issue

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-20 17:05:36 +05:30
Stéphane Cazeaux b87712921f
Added experiment and lib for spring-boot (#511)
* Added experiment and lib for spring-boot

Signed-off-by: Stéphane Cazeaux <stephane.cazeaux@orange.com>
2022-09-20 14:53:11 +05:30
Akash Shrivastava e3c0492a61
Response Body modification in HTTP Status code experiment (#556)
* added response body in status code; Added content encoding and type in body and status; Removed unnecessary logging

Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
2022-09-15 18:26:56 +05:30
Udit Gaurav f82e0357af
Chore(sdk): Adds SDK Template for Cloud based experiments (#560)
* Chore(sdk): Adds SDK Template for Cloud based experiments

Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>
2022-09-15 17:37:28 +05:30
Udit Gaurav 3f0d50813b
Chore(capability): Remove extra capabilities from stress chaos experiments (#557)
* Chore(capability): Remove extra capanilities from stress chaos experiments

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Update run-e2e-on-pr-commits.yml

* Update stress-chaos.go

Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-09-14 18:00:23 +05:30
Shubham Chaudhary 0bae5bec27
Deriving podIps of the pods for k8s service if target pod has serviceMesh sidecar (#558)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-12 16:19:09 +05:30
Shubham Chaudhary 718e8a8f18
chore(status): Handling terminated containers (#552)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-05 12:06:35 +05:30
Shubham Chaudhary 158c9a8f63
chore(sdk): Adding service account and helper pod failure check (#553)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-09-05 12:05:51 +05:30
Shubham Chaudhary f3203a8692
chore(history): Converting history field to pointer (#550)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-08-29 12:22:04 +05:30
Akash Shrivastava 671c5e04b8
Added support for status code list in HTTP Chaos (#545)
* Added support for selecting random code from list of codes in status code

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Random code logic fix

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* log improvement

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

Signed-off-by: Akash Shrivastava <as86414@gmail.com>
Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2022-08-16 15:02:52 +05:30
Akash Shrivastava ad6b97f05d
Added toxicity support in HTTP chaos experiments (#544)
* Added toxicity support in HTTP chaos experiments

* Fixed issue with helper not reading toxicity env

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

Signed-off-by: Akash Shrivastava <as86414@gmail.com>
Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2022-08-16 15:02:24 +05:30
Udit Gaurav dc62a12af1
Fix(pipeline): Fixes e2e pipeline check (#549)
Signed-off-by: uditgaurav <udit@chaosnative.com>

Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-08-16 14:34:08 +05:30
Shubham Chaudhary 06312c8893
chore(cmdProbe): Adding imagePullSecrets source cmdProbe (#547)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-08-04 19:46:16 +05:30
Shubham Chaudhary f402bf8f08
chore(sdk): Adding support for helper based chaoslib (#546)
* chore(sdk): Adding support for helper based chaoslib

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* chore(sdk): Adding support for helper based chaoslib

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: ispeakc0de <ashubham314@gmail.com>

* Update contribute/developer-guide/README.md

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2022-08-03 14:10:46 +05:30
Akash Shrivastava 535c1e7d05
Chore[New exp]: HTTP Modify Status Code experiment for K8s (#539)
* Added base code for http status code experiment

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Minor fixes in toxiproxy args

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added appns etc in test.yaml

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Update experiments/generic/pod-http-status-code/experiment/pod-http-status-code.go

Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io>

* Added httpchaostype var

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Removed httpchaostype var and moved log into chaos type specific files

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* added check for status code

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* restructured code; fixed random logic; improved logs

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* changed logic for ModifyResponseBody conversion

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* minor readme fix

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io>
2022-07-15 16:36:41 +05:30
Akash Shrivastava acdbe8126e
Chore[New exp]: HTTP Modify Headers experiment for K8s (#541)
* Added http modify header experiment

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added entry in experiment

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Minor fixes in toxiproxy args

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* moved tunables logs to specific lib file

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* improved code

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* fixed issues in comments

Signed-off-by: Akash Shrivastava <as86414@gmail.com>
2022-07-15 12:04:24 +05:30
Akash Shrivastava af6be30fbd
Chore[New exp]: HTTP Modify Body experiment for K8s (#540)
* Added http modify body experiment

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* fixed issue with toxic command

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* fixed log issue

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* moved tunables logs to specific lib file

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* updated operator path

Signed-off-by: Akash Shrivastava <as86414@gmail.com>
2022-07-15 00:26:42 +05:30
Kale Oum Nivrathi 8222832d3d
chore(Probes): Probe enhancements for cmdProbe as a source (#471)
* chore(Probes): Probe enhancements for cmdProbe as a source

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-07-14 20:51:38 +05:30
Shubham Chaudhary 01bc4d93aa
Updating litmus-client and k8s version to 1.21.2 (#542)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-07-14 19:27:33 +05:30
Akash Shrivastava 37748de56c
Chore[New exp]: HTTP Reset Peer experiment for K8s (#534)
* Added pod-http-reset-peer experiment code

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added to experiment main; Improved and cleaned code; Improved logs

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added rbac and readme

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Update experiments/generic/pod-http-reset-peer/experiment/pod-http-reset-peer.go

Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io>

* moved tunables logs to specific lib file

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* removed unused check

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io>
2022-07-14 18:01:49 +05:30
Gonzalo Reyero Ferreras 9c6c7d1e42
Add missing return to AnnotatedApplicationsStatusCheck (#533)
Signed-off-by: Gonzalo Reyero Ferreras <greyerof@redhat.com>
2022-06-27 18:37:23 +05:30
Udit Gaurav 335b0d064a
Fix node level e2e pipeline to run the ci tests (#529)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-06-15 13:49:35 +05:30
Akash Shrivastava 3b04bfab25
Chore[New exp]: HTTP Chaos for K8s (#524)
* Added base code for httpchaos

Signed-off-by: Vedant Shrotria <vedant.shrotria@harness.io>

* Renamed files; Removed unused env vars; Added and restructures env vars; Restructured helper code; Restructured lib code

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added support for sethelperdata env

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* fixed filename; Improved and cleaned logs

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Restructured to pass argument through separate lib file for new http experiment, no changes in helper lib required

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added exit checks in abort retries; Improved kill proxy; Added kill proxy if start proxy fails

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Improved logs for getcontainerid

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Changed retrying logic

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added readme, test.yaml and rbac.yaml; Fixed gofmt issue in helper.go

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Removed target_host env

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Improved error logging; Improved revert process

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Removed toxiproxy from dockerfile; Improved logs and comment

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added network interface tunable; Made getContainerID runtime based as a standard function

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Changed TARGET_PORT->TARGET_SERVICE_PORT, LISTEN_PORT->PROXY_PORT

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io>
Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2022-06-14 17:15:46 +05:30
Akash Shrivastava e32af9a434
Chore[Fix]: Node uncordon when app status check failed inside lib (#526)
* Added uncordon step when app status check fails

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Fixed error var issue; Changed deprecated flag from node drain command

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added logs for revert chaos when aut and auxapp check fails

Signed-off-by: Akash Shrivastava <as86414@gmail.com>

* Added [Revert] tag in logs

Signed-off-by: Akash Shrivastava <as86414@gmail.com>
2022-06-14 11:11:00 +05:30
Jimmy Zhang a6435a6bd1
Add --stress-image for stressArgs for pumba lib (#521)
Signed-off-by: Jimmy Zhang <zhang.artur@gmail.com>
2022-06-01 12:54:19 +05:30
Neelanjan Manna 151ca50fe7
added ChaosResult verdict updation step (#523)
Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>
2022-06-01 12:53:32 +05:30
Udit Gaurav 111534cf32
Chore(helper pod): Make setHelper data as tunable (#519)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-05-13 09:23:54 +05:30
Akash Shrivastava ffc96c09ae
return error if node not present (#516)
Signed-off-by: Akash Shrivastava <akash@chaosnative.com>
2022-05-11 21:31:37 +05:30
Neelanjan Manna 940c7ffa30
updated appns podlist filtering error handling (#515)
Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
Co-authored-by: Vedant Shrotria <vedant.shrotria@harness.io>
2022-05-10 19:00:57 +05:30
Akash Shrivastava a7694af725
Added Active Node Count Check using AWS APIs (#500)
* Added node count check using aws apis

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Added node count check using aws apis to instance terminate by tag experiment

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Log improvements; Code improvement in findActiveNodeCount function;

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Added log for instance status check failed in find active node count

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Added check if active node count is less than provided instance ids

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>
2022-05-10 15:48:58 +05:30
Soumya Ghosh Dastidar 8d43271bd2
fix: updated release workflow (#512)
Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>
2022-05-10 15:19:45 +05:30
Shubham Chaudhary 973bb0ea1c
update(sdk): updating litmus sdk for the defaultAppHealthCheck (#513)
Signed-off-by: shubhamc <shubhamc@jfrog.com>

Co-authored-by: shubhamc <shubhamc@jfrog.com>
2022-05-10 15:19:12 +05:30
Neelanjan Manna 817f4d6199
GCP Experiments Refactor, New Label Selector Experiments and IAM Integration (#495)
* experiment init

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* updated experiment file

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* updated experiment lib

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* updated post chaos validation

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* updated empty slices to nil, updated experiment name in environment.go

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* removed experiment charts

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* bootstrapped gcp-vm-disk-loss-by-label artiacts

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* removed device-names input for gcp-vm-disk-loss experiment, added API calls to derive device name internally

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* removed redundant condition check in gcp-vm-disk-loss experiment pre-requisite checks

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* reformatted error messages

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* replaced the SetTargetInstances function

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* added settargetdisk function for getting target disk names using label

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* refactored Target Disk Attached VM Instance memorisation, updated vm-disk-loss and added lib logic for vm-disk-loss-by-label experiment

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* added experiment to bin and cleared default experiment name in environment.go

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* removed charts

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* updated test.yml

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* updated AutoScalingGroup to ManagedInstanceGroup; updated logic for checking InstanceStop recovery for ManagedInstanceGroup VMs; Updated log and error messages with VM names

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* removed redundant computeService code snippets

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* removed redundant computeService code snippets in gcp-disk-loss experiments

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* updated logic for deriving default gcp sa credentials for computeService

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* updated logging for IAM integration

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* refactored log and error messages and wait for start/stop instances logic

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* fixed logs, optimised control statements, added comments, corrected experiment names

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* fixed file exists check logic

Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>

* updated instance and device name fetch logic for disk loss

Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>

* updated logs

Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>
2022-04-28 17:54:28 +05:30
Udit Gaurav 85733418d2
Chore(ssm): Update the ssm file path in the Dockerfile (#508)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-04-16 22:42:07 +05:30
Udit Gaurav 6fcb641cca
Chore(warn): Remove warning Neither --kubeconfig nor --master was specified for InClusterConfig (#507)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-04-16 12:46:11 +05:30
Udit Gaurav 0cb4d22e2d
Fix(container-kill): Adds statusCheckTimeout to container kill recovery (#499)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-04-14 13:15:32 +05:30
Udit Gaurav 7d7adcbef7
Fix(container-kill): Adds statusCheckTimeout to container kill recovery (#498)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-04-14 10:10:21 +05:30
Udit Gaurav 1b894e57fc
Fix(targetContainer): Incorrect target container passed in the helper pod for pod level experiments (#496)
* Fix target container issue

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Fix target container issue

Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-03-28 06:56:06 +05:30
Udit Gaurav 433e40d2fb
(enahncement)experiment: add node label filter for pod network and stress chaos (#494)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-03-16 16:06:10 +05:30
Udit Gaurav 8a63701113
Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill (#493)
* Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Chore(randomize): Randomize chaos tunables for schedule chaos and disk-fill

Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-03-15 22:31:01 +05:30
Udit Gaurav 0175a3ce90
Chore(randomize): Randomize stress-chaos tunables (#487)
* Chore(randomize): Randomize stress-chaos tunables

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Update stress-chaos.go
2022-03-15 22:21:21 +05:30
Udit Gaurav 8421105b47
Chore(network-chaos): Randomize Chaos Tunables for Netowork Chaos Experiment (#491)
* Chore(network-chaos):

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Chore(network-chaos): Randomize Chaos Tunables for Netowork Chaos Experiment

Signed-off-by: uditgaurav <udit@chaosnative.com>

Co-authored-by: Karthik Satchitanand <karthik.s@mayadata.io>
2022-03-15 14:28:46 +05:30
Udit Gaurav 4e7877bb92
Chore(snyk): Fix snyk security scan on litmus-go (#492)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-03-15 08:12:22 +05:30
Udit Gaurav 1ee2680988
Chore(cgroup): Add support for cgroup version2 in stress-chaos experiment (#490)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-03-14 16:05:47 +05:30
Udit Gaurav 271de31ce2
Chore(vulnerability): Remove openebs retry module and update pkgs (#488)
* Chore(vulnerability): Fix some vulnerability by updaing the pkgs

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Chore(vulnerability): Remove openebs retry module and update pkgs

Signed-off-by: udit <udit@chaosnative.com>
2022-03-03 18:24:21 +05:30
Raj Babu Das f12b0b4bb5
Fixeing alpine CVEs by upgrading the version (#486) 2022-02-21 21:24:42 +05:30
Udit Gaurav 027cad6a38
Chore(stress-chaos): Run CPU chaos with percentage of cpu cores (#482)
* Chore(stress-chaos): Run CPU chaos with percentage of cores

Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-02-16 17:51:55 +05:30
Udit Gaurav 4bd3845b12
Chore(network-chaos): Add jitter in pod-network-latency experiment (#478)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2022-01-14 16:48:51 +05:30
Akash Shrivastava db959902aa
Minor changes in Azure Experiments (#476)
* Removed AUT and app check from azure experiments

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Changed azure status functions to accept string values rather than experimentType struct

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Removed unused variables from test.yaml

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Added AUT and Auxillary app check

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Updated test.yaml

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>
2022-01-12 14:08:56 +05:30
Udit Gaurav 57fef3d3ff
Chore(stress-chaos): Support stress-chaos experiment with custom experiment name (#474)
Signed-off-by: udit <udit@chaosnative.com>
2022-01-03 11:33:25 +05:30
Akash Shrivastava 017fbe143d
Changed the failstep message in experiment template of litmus-sdk to follow the new convention (#475)
Signed-off-by: Akash Shrivastava <akash@chaosnative.com>
2022-01-03 11:24:52 +05:30
Udit Gaurav f2db46f74a
Remove the stress process on timeout without failure (#472)
Signed-off-by: udit <udit@chaosnative.com>
2021-12-15 13:27:56 +05:30
Andrew Hu 07c6647006
fix issue-3350 (#468)
Signed-off-by: Andrew Hu <andrew.hu@hcl.com>

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2021-12-14 19:34:42 +05:30
Nic Johnson 8b962d41d8
Refactor/experiment contributing (#470)
* docs: add instructions for building litmus-sdk binary

Non Linux AMD64 users will need to build the binary for their target
platform.

Signed-off-by: Nic Johnson <nicjohnson145@hotmail.com>

* docs: update generated code & docs to aid experiment contribution

It wasn't very clear what generated code needed to be kept, and what
generated code needed to be replaced with experiment-specific code.
Attempt to make that more clear by expanding README & adding grep-able
tags inside generated code.

Signed-off-by: Nic Johnson <nicjohnson145@hotmail.com>
2021-12-14 19:28:45 +05:30
Shubham Chaudhary ef6625f15c
fix(helper): removing job-name label from the helper pod (#466)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-11-15 14:20:38 +05:30
Neelanjan Manna 4dbfe5768a
corrected spelling for received (#463)
Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>
2021-11-13 22:51:34 +05:30
Shubham Chaudhary 68f942bf49
chore(retry): adding retries for the kubeapi request (#464)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-11-10 10:44:29 +05:30
Udit Gaurav 6ceb607e2c
Fix stress chaos with pumba lib with updated pumba (#462)
Signed-off-by: udit <udit@chaosnative.com>
2021-11-03 08:43:08 +05:30
Shubham Chaudhary 7a6c0b9e3e
fix(event): fixing chaosinject event in container-kill experiment (#461)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-10-28 14:11:58 +05:30
Neelanjan Manna e3ab653bc7
Update failStep Messages to be Patched into ChaosResult (#452)
* updated the failSteps for initial ChaosResult updation, pre-chaos resource status check, and post-chaos resource status check

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>
2021-10-27 14:08:55 +05:30
Shubham Chaudhary 37130c3556
chore(ipv6): Adding support for ipv6 in network experiments (#460)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-10-25 14:12:23 +05:30
Shubham Chaudhary 0f750768d5
chore(sdk): sync the sdk-templates with experiment (#457)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-10-19 22:20:30 +05:30
Shubham Chaudhary d64c3aed89
feat(chaosdetails): Adding common function to get the chaosdetails attributes (#428)
* feat(chaosdetails): Adding common function to get the chaosdetails attributes

Signed-off-by: shubham chaudhary <shubham@chaosnative.com>

* resolving conflicts
2021-10-13 13:06:24 +05:30
Shubham Chaudhary b6d04fbd2b
chore(charts): update readme, contributor guide and github actions (#454)
* chore(charts): update readme, contributor guide and github actions

Signed-off-by: shubham chaudhary <shubham@chaosnative.com>

* chore(repo-health): adding cii best practices

Signed-off-by: shubham chaudhary <shubham@chaosnative.com>

* chore(synk): Adding snyk to check the vulnerabilities

Signed-off-by: shubham chaudhary <shubham@chaosnative.com>

* chore(trivy): remove trivy check from push and release actions

Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-10-13 12:57:37 +05:30
John Abraham 179dfd0544
Bug(Experiment Generator): Add AuxiliaryAppInfo to template (#455)
Signed-off-by: dravog7 <dravog78@gmail.com>
2021-10-12 02:59:18 +05:30
Neelanjan Manna f598a43c66
VMWare VM-Poweroff Experiment Enhancements (#449)
* stopVM and startVM moved to vm-operations.go,added error check in api calls, modified function signatures to remove dependency on expexperiment specific types.go

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* added chaos injection functionality for multiple VMs; added chaos parallel and serial chaos injection

Signed-off-by: neelanjan00 <neelanjan@chaosnative.com>

* added functionality for waiting through the duration of fully starting and fully stopping the VM
2021-10-01 10:31:28 +05:30
Shubham Chaudhary 186d494e41
chore(logs): fixing logs and error handling (#444)
* chore(logs): fixing logs and error handling

Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-09-30 17:24:30 +05:30
Shubham Chaudhary 29e31243a6
chore(labels): passing labels to the helper pod (#451)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-09-30 16:35:36 +05:30
Udit Gaurav 4278b83764
Chore(helper): Add termination grace period seconds to the helper pods and tolerationSeconds in kubelet and docker svc kill experiment helper (#433)
* Chore(helper): Add termination grace period seconds to the helper pods

Signed-off-by: udit <udit@chaosnative.com>

* Add tolerationSeconds in kubelet and docker svc kill experiment

Signed-off-by: udit <udit@chaosnative.com>
2021-09-30 15:51:45 +05:30
Udit Gaurav 45591415b1
Chore(secured-image): Add litmus hardened alpine image as base image (#448)
Signed-off-by: udit <udit@chaosnative.com>
2021-09-29 08:16:40 +05:30
Shubham Chaudhary 8fcd92e7fb
update(version): updating kubernetes version (#450)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-09-24 15:46:52 +05:30
Shubham Chaudhary c40973c395
feat(default-checks): Adding default checks as tunable (#427)
* feat(default-checks): Adding default checks as tunable

Signed-off-by: shubham chaudhary <shubham@chaosnative.com>

* updating chaosengine api

Signed-off-by: shubham chaudhary <shubham@chaosnative.com>

* renaming defaultChecks to defaultAppHealthCheck

Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-09-14 14:12:12 +05:30
Shubham Chaudhary 686894d3bf
chore(probe): passing experiment serviceAccount to the probe pod (#443)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-09-13 12:47:41 +05:30
Shubham Chaudhary 81d4ab04ca
chore(e2e): commenting helm-e2e workflow (#445)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-09-13 11:41:17 +05:30
Shubham Chaudhary 134b131230
(result): adding experiment & instance-id inside helper pod (#441)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2021-09-02 21:19:43 +05:30
gpsingh-1991 073efe2436
New experiment for node restart using redfish API (#404)
Signed-off-by: Gurpreet Singh <mail2singhgurpreet@gmail.com>
* Update experiments/baremetal/redfish-node-restart/experiment/redfish-node-restart.go
2021-09-01 15:17:57 +05:30
Neelanjan Manna 28dd78529d
Functionality Updation for GCP VM Instance Stop Experiment in case of Auto-Scaling Group (#406)
* Bootstrap project created; GCP compute engine library, context library and option library added

Signed-off-by: neelanjan00 <neelanjanmanna@gmail.com>
2021-08-30 14:34:35 +05:30
Shubham Chaudhary 09bf16d95a
(docs): updating docs links of all experiments (#439)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-08-26 11:46:45 +05:30
Shubham Chaudhary 556590c405
feat(network-partition): Adding pod-network-partition experiment (#386)
* feat(network-partition): Adding pod-network-partition experiment

Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>

* update(rbac): updating rbac for the minimal permissions

Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>

* feat(experiment): adding podselector and namespace selectors

Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2021-08-25 08:20:06 +05:30
Akash Shrivastava 8f9653eb8f
Chore(Azure): Virtual Disk loss experiment (#395)
* Added virtual disk loss experiment for azure

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Added serial/parallel mode for disk loss experiment;
Added disk status function

* Fixed issue with abortWatcher;
Added comments

* Added env fields in test.yml

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Added readme

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Added probes during chaos (was missing in code before)

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Added pre-chaos check for disk

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Changed experiment name to azure-disk-loss

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Removed default values from environment.go

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Added comments;
Refactored code for better readability

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* moved code for fetching disk list out of switch case

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Restructed azure experiment files and folders

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Changed experiment to work with only disk names, no need to provide instance name

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Added support for AKS nodes/VMSS

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Fixed abortwatcher to now check for VM state also

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Added comments

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Changed IsScaleSet to ScaleSet and it now accepts enable/disable as value;
Minor changes and restructuring

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Changes as per review

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Updated azure sdk version in chaoslib file

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Added error checking in abort watcher for retry statement

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Minor changes in logs

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>
2021-08-24 13:02:18 +05:30
Rémi ZIOLKOWSKI 33148e580c
Add DockerSocket to network experiments && remove chmod 777 (#437)
* Add DockerSocket to network experiments && remove chmod 777

Signed-off-by: Rémi ZIOLKOWSKI <remi.ziolkowski-ext@pole-emploi.fr>

* sudo work

Signed-off-by: Rémi ZIOLKOWSKI <remi.ziolkowski-ext@pole-emploi.fr>

* fix indent

Signed-off-by: Rémi ZIOLKOWSKI <remi.ziolkowski-ext@pole-emploi.fr>

* indent

Signed-off-by: Rémi ZIOLKOWSKI <remi.ziolkowski-ext@pole-emploi.fr>

* :D gofmt -w .

Signed-off-by: Rémi ZIOLKOWSKI <remi.ziolkowski-ext@pole-emploi.fr>
2021-08-19 23:22:50 +05:30
Shubham Chaudhary 324a88dfef
(docs): adding host-network for the cmdProbe (#436)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-08-18 10:29:27 +05:30
Shubham Chaudhary 3c9a326200
feat(experiment): derive parent name in pod-delete only (#434)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-08-13 17:49:03 +05:30
Shubham Chaudhary 71b3c41ccc
feat(target-pods: validating target pods env (#430)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-08-11 15:02:00 +05:30
Shubham Chaudhary a15c04db8d
feat(exec-chaos): handle the abort signal (#425)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-08-11 14:59:38 +05:30
Shubham Chaudhary 50c0d72a9a
feat(pre-chaos): creating prechaos and postchaos event for the neg cases (#422)
* feat(pre-chaos): creating prechaos event for the neg cases

Signed-off-by: shubham chaudhary <shubham@chaosnative.com>

* feat(chaos): Added changes in sdk

Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-08-11 14:57:59 +05:30
Shubham Chaudhary a259a14064
feat(helper-status): fix helper status timeout when helper pod failed (#421)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>
2021-08-11 14:57:11 +05:30
OUM NIVRATHI KALE cc500239f4
fixing bugs (#418)
Signed-off-by: Oum Kale <oumkale@chaosnative.com>

Co-authored-by: Shubham Chaudhary <shubham@chaosnative.com>
2021-08-11 14:56:49 +05:30
Udit Gaurav 3786e9af94
Chore(lint): Add Golang lint checker in CI and fix the current lint failure (#417)
* Chore(lint): Add Golang lint check

Signed-off-by: udit <udit@chaosnative.com>

* Fix gofmt

Signed-off-by: udit <udit@chaosnative.com>

Co-authored-by: Shubham Chaudhary <shubham@chaosnative.com>
2021-08-11 14:56:07 +05:30
Akash Shrivastava 4d41f34adb
Pod experiment fixes (#429)
* Fixed issue with pod affected percentage > 100

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>
2021-08-11 14:14:17 +05:30
Neelanjan Manna 21be0d9a4b
Auxiliary Application Check for node-memory-hog Experiment and Multiple Target Node Functionality Fix for three Node-Level Experiments (#424)
* Fixed multiple node selection functionality in nodes.go

Signed-off-by: neelanjan00 <neelanjanmanna@gmail.com>

* Added Auxiliary Application Status Check to node-memory-hog experiment

Signed-off-by: neelanjan00 <neelanjanmanna@gmail.com>
2021-08-09 11:28:08 +05:30
Akash Shrivastava 6bff393c45
Chore(Azure): Azure Instance Stop experiment (support for: virtual machine scale sets) (#403)
* Added support for virtual machine scale sets (for aks nodes)

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>
2021-08-05 13:48:12 +05:30
Neelanjan Manna f3997aa2dd
GCP VM Disk Loss Experiment (#397)
* Bootstrap project created; GCP compute engine library, context library and option library added

* VM instance status methods and VM instance operation methods added

* Defined types.go; removed default template code for chaoslib>litmus>vm-instance-stop>lib>vm-instance-stop.go and experiments>gcp>vm-instance-stop>experiment>vm-instance-stop.go

* updated function parameter names and varaible names in vm-instance-status.go

* vm-instance-stop experiment chaoslib methods PrepareGCPTerminateByName, injectChaosInSerialMode, injectChaosInParallelMode, and abortWatcher added

* AuxiliaryAppInfo field added in types.go and correspondingly in environment.go

* Modified function PrepareGCPTerminateByName identifier to PrepareVMTerminateByName

* Modified function PrepareVMTerminateByName to PrepareVMStop

* VMInstanceStop experiment added

* Added VMInstanceStop experiment to the main function

* engine.yaml, experiment.yaml, and rbac.yaml charts updated

* Corrected mount path in experiment.yaml

* Functionality for fetching secret files from Kubernetes secret volume for the purpose of auth added

* Updated engine.yaml chart

* Initialized default value for TOTAL_CHAOS_DURATION, CHAOS_INTERVAL, and RAMP_TIME in experiment.yaml

* Updated function VMInstanceStop GetENV and corresponding logs; Updated GetServiceAccountJSONFromSecret to return proper credentials

* removed default comments from experiment.go

* corrected yaml syntax in experiment.yaml

* renamed ChaosEngine name to gcp-vm-chaos in engine.yaml

* updated expirement description message in experiment.yaml

* Replaced logo for GCP and VM Instance Delete experiment

* Bootstrap project files created

* functions DiskVolumeDetach and DiskVolumeAttach defined in disk-operations.go

* defined GetVolumeAttachmentDetails function and function comments to disk-operations.go; defined WaitForVolumeDetachment, WaitForVolumeAttachment GetDiskVolumeState, DiskVolumeStateCheckByName, and CheckDiskVolumeDetachmentInitialisation functions in disk-volume-status.go

* Updated order of logs and comments for disk-operations.go

* Updated comments and logs in disk-volume-status.go

* Defined types.go and experiment.go, cleared the boiler plate code for experiment and chaoslib

* Renamed variables in CheckDiskVolumeDetachmentInitialisation function in disk-volume-status.go

* Renamed instanceZone variable to zone in disk-operations.go

* Renamed instanceZone variable to zone in disk-volume-status.go

* vm-disk-loss.go functions defined: PrepareDiskVolumeLossByName, InjectChaosInSerialMode, InjectChaosInParallelMode, and AbortWatcher

* renamed PrepareDiskVolumeLossByName and DiskVolumeStateCheckByName in vm-disk-loss.go and disk-volume-status.go

* VolumeDiskLoss function defined in vm-disk-loss.go

* vm-disk-loss experiment added to experiment.go

* updated log output format and added null check in GetVolumeAttachmentDetails in disk-operation.go; updated DiskVolumeStateCheck to validate number of disks, zones and device names and updated error statements in disk-volume-status.go; replaced VolumeDetach method with VolumeAttach method in vm-disk-loss.go

* Validation check added for empty project id in DiskVolumeStateCheck

* removed RunID from types.go

* engine.yaml updated

* experiment.yaml updated

* Updated rbac.yaml

* terminate replaced to stop in a log in vm-instance-stop.go

* Updated comments in vm-instance-stop.go

* Updated comments in vm-instance-stop.go

* Updated comments in vm-instance-stop.go

* Removed attributes.yaml

* removed the charts directory

* comment added to VMInstanceStop function in vm-instance-stop.go

* AbortWatcherWithoutExit function call removed as it was called twice in vm-instance-stop.go

* Event generation of Result as Awaited shifted after SetResultUID function call in vm-instance-stop.go

* post chaos vm instance status log moved outside the loop

* test.yml updated

* VM_INSTANCE_NAME and INSTANCE_ZONE replaced with their plurals

* removed ActiveNodes from types.go

* ManagedNodegroup replaced by AutoScalingGroup

* Removed attributes.yaml

* added getFileContent function to get-credentials-json.go; implemented marshalling for creating json byte slice credentials

* Modified log statement in vm-instance-status.go

* Signing the commit for vm-instance-stop experiment

* changed DiskVolumeName, DiskZone and DeviceName variables to plural

* updated engine.yaml, experiment.yaml and test.yml

* Removed charts directory

* Signing vm-disk-loss experiment

* renamed the experiment to gcp-vm-instance-stop

* Replaced the revert chaos status

* Replaced PreChaosNodeStatusCheck, PostChaosActiveNodeCountCheck, and getActiveNodeCount functions to nodes.go; updated the gcp-vm-instance-stop experiment for the same.

* Replaced PreChaosNodeStatusCheck, PostChaosActiveNodeCountCheck, and getActiveNodeCount functions to nodes.go; updated the ec2-terminate-by-id and ec2-terminate-by-tag.go experiments for the same.

* Implemented the WaitForDuration method for waiting through the chaos interval in serial and parallel mode

* Replaced Sequence with Zone in experiment log

* Updated error statement

* Removed redundant chaos setTarget

* error message updated

* Updated ramp time and chaos interval default values in environment.go

* Renamed vm-disk-loss experiment to gcp-vm-disk-loss experiment

* Updated chaos interval value to 30 from 10 in experiment.go

* Corrected gcp-vm-disk-loss experiment name in experiment.go

* cleaned unused dependencies

* go.sum modified post merge

* diskNamesList split function moved outside switch case; updated inject chaos function identifiers to lower case

* replaced instanceNamesList[] as a global variable in gcp-vm-diskloss.go

* VolumeDiskDetachmentInitialization check removed

* Removed ramp time from experiment logs

* Updated experiment charts for new experiment name

* ramp time set to 0 in test.yml

* Updated DiskVolumeStateCheck function

* Removed VolumeAffectedPerc from types.go and environment.go

* Updated a comment

* Updated an error log statement

* Updated function signatures

* Moved instance names prepration steps to PrepareDiskVolumeLoss function

* corrected typo error in an error log

* Removed RunID from types.go

* instanceNamesList moved inside PrepareDiskVolumeLoss function

Co-authored-by: Shubham Chaudhary <shubham.chaudhary@mayadata.io>
Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2021-07-14 15:21:20 +05:30
gpsingh-1991 79207d9bae
Add new experiment for fio stress test (#383)
* Add new experiment for fio stress test

Add a new experiment to perform the fio stress test on the target
container via litmus.

Signed-off-by: Gurpreet Singh <mail2singhgurpreet@gmail.com>

* Update chaoslib/litmus/pod-fio-stress/lib/pod-fio-stress.go

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>

* Update experiments/generic/pod-fio-stress/README.md

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>

* Update README.md

* Add new experiment for fio stress test

Add a new experiment to perform the fio stress test on the target
container via litmus.

Signed-off-by: Gurpreet Singh <mail2singhgurpreet@gmail.com>

* Fix minor issues in pod-fio-stress chaos

- Update error message
- Remove ChaosInjectCmd as it is not being used
- Update ChaosKillCmd to killall instead of kill
- Update monitoring steps
- Remove AUXILIARY_APPINFO from test.yml
- Update abort watcher to AbortWatcherWithoutExit
- Validate stressErr
- Add appropriate comment

Signed-off-by: Gurpreet Singh <mail2singhgurpreet@gmail.com>

* Update chaoslib/litmus/pod-fio-stress/lib/pod-fio-stress.go

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>

* Fix minor issues

-- Add stressErr check for serial chaos execution
-- Add comments

Signed-off-by: Gurpreet Singh <mail2singhgurpreet@gmail.com>

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
Co-authored-by: Shubham Chaudhary <shubham@chaosnative.com>
2021-07-14 08:14:51 +05:30
Neelanjan Manna 16a35986c1
GCP VM Instance Stop Experiment (#388)
* Bootstrap project created; GCP compute engine library, context library and option library added

* VM instance status methods and VM instance operation methods added

* Defined types.go; removed default template code for chaoslib>litmus>vm-instance-stop>lib>vm-instance-stop.go and experiments>gcp>vm-instance-stop>experiment>vm-instance-stop.go

* updated function parameter names and varaible names in vm-instance-status.go

* vm-instance-stop experiment chaoslib methods PrepareGCPTerminateByName, injectChaosInSerialMode, injectChaosInParallelMode, and abortWatcher added

* AuxiliaryAppInfo field added in types.go and correspondingly in environment.go

* Modified function PrepareGCPTerminateByName identifier to PrepareVMTerminateByName

* Modified function PrepareVMTerminateByName to PrepareVMStop

* VMInstanceStop experiment added

* Added VMInstanceStop experiment to the main function

* engine.yaml, experiment.yaml, and rbac.yaml charts updated

* Corrected mount path in experiment.yaml

* Functionality for fetching secret files from Kubernetes secret volume for the purpose of auth added

* Updated engine.yaml chart

* Initialized default value for TOTAL_CHAOS_DURATION, CHAOS_INTERVAL, and RAMP_TIME in experiment.yaml

* Updated function VMInstanceStop GetENV and corresponding logs; Updated GetServiceAccountJSONFromSecret to return proper credentials

* removed default comments from experiment.go

* corrected yaml syntax in experiment.yaml

* renamed ChaosEngine name to gcp-vm-chaos in engine.yaml

* updated expirement description message in experiment.yaml

* Replaced logo for GCP and VM Instance Delete experiment

* terminate replaced to stop in a log in vm-instance-stop.go

* Updated comments in vm-instance-stop.go

* Updated comments in vm-instance-stop.go

* Updated comments in vm-instance-stop.go

* Removed attributes.yaml

* removed the charts directory

* comment added to VMInstanceStop function in vm-instance-stop.go

* AbortWatcherWithoutExit function call removed as it was called twice in vm-instance-stop.go

* Event generation of Result as Awaited shifted after SetResultUID function call in vm-instance-stop.go

* post chaos vm instance status log moved outside the loop

* test.yml updated

* VM_INSTANCE_NAME and INSTANCE_ZONE replaced with their plurals

* removed ActiveNodes from types.go

* ManagedNodegroup replaced by AutoScalingGroup

* added getFileContent function to get-credentials-json.go; implemented marshalling for creating json byte slice credentials

* Modified log statement in vm-instance-status.go

* Signing the commit for vm-instance-stop experiment

* renamed the experiment to gcp-vm-instance-stop

* Replaced the revert chaos status

* Replaced PreChaosNodeStatusCheck, PostChaosActiveNodeCountCheck, and getActiveNodeCount functions to nodes.go; updated the gcp-vm-instance-stop experiment for the same.

* Replaced PreChaosNodeStatusCheck, PostChaosActiveNodeCountCheck, and getActiveNodeCount functions to nodes.go; updated the ec2-terminate-by-id and ec2-terminate-by-tag.go experiments for the same.

* Implemented the WaitForDuration method for waiting through the chaos interval in serial and parallel mode

* Replaced Sequence with Zone in experiment log

* Updated error statement

* Removed redundant chaos setTarget

* error message updated

* Updated ramp time and chaos interval default values in environment.go

* Added default ramp time of 0 in environment.go

* cleaned unused dependencies

* removed unused go dependencies

Co-authored-by: Shubham Chaudhary <shubham.chaudhary@mayadata.io>
Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2021-07-13 09:52:51 +05:30
Akash Shrivastava 5c35333a95
Chore(Azure): Update and Shift Azure instance stop chaos to master (#377)
* Chore(new_exp): Add Azure instance terminate chaos (#296)

* Chore(new_exp): Add Azure instance terminate chaos

Signed-off-by: udit <udit.gaurav@mayadata.io>

* Added support for multiple instance id in azure instance terminate| Added abortwatcher code for azure instance terminate experiment

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Fix minor filename issue

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Removed vendors

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Made changes as per suggestion in PR

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Changed instance status display logic to not use loop

* Reverted experiment.go values to default

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

* Improved logs

Signed-off-by: Akash Shrivastava <akash@chaosnative.com>

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2021-07-12 19:23:43 +05:30
Shubham Chaudhary 0a5e227031
fix(stress-chaos): fixing the stress chaos for abort cases and target details (#399)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2021-07-10 12:12:03 +05:30
Shubham Chaudhary eec9e99631
chore(nonchaos-pods): filter all the nonchaos-pods for target pod selection (#401)
Signed-off-by: shubham chaudhary <shubham@chaosnative.com>

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2021-07-10 11:53:55 +05:30
Udit Gaurav 095e1d2a96
Chore(build): Build Go binary inside image (#394)
* Chore(build): Build Go binary inside image

Signed-off-by: udit <udit@chaosnative.com>

* remove unused Dockerfile

Signed-off-by: udit <udit@chaosnative.com>

* upadte go version to 1.16

Signed-off-by: udit <udit@chaosnative.com>

* upadte go version to 1.16

Signed-off-by: udit <udit@chaosnative.com>

Co-authored-by: Shubham Chaudhary <shubham@chaosnative.com>
2021-07-09 18:02:55 +05:30
Shubham Chaudhary babf0c4aa5
fix(ec2): removing the duplicate function calls (#393)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-07-05 12:04:41 +05:30
Jakub Stejskal a87a43801f
Add option to change default container registry for image build (#390)
* Add option to change default container registry during image build

Signed-off-by: Jakub Stejskal <xstejs24@gmail.com>
2021-06-29 11:44:30 +05:30
Udit Gaurav d22245b565
Chore(stress-chaos): Remove extra previlages and update io stressors and timeout (#391)
* Chore(stress-chaos): Remove extra privileges and update io stressors and timeout

Signed-off-by: udit <udit@chaosnative.com>

Co-authored-by: Shubham Chaudhary <shubham.chaudhary@mayadata.io>
2021-06-28 14:44:05 +05:30
Shubham Chaudhary d45c8d701d
feat(annotation): skipping parent resources listing, if annotationCheck is false (#387)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-06-28 14:36:45 +05:30
Udit Gaurav 090049152a
Fix: Populate total chaos duration to helper in dns experiments (#389)
Signed-off-by: udit <udit@chaosnative.com>
2021-06-23 12:47:38 +05:30
Shubham Chaudhary 402a4687dc
fix(log): fixed the comparator logs (#384)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-06-17 14:40:58 +05:30
Udit Gaurav 2fc4266561
Chore(New_Exp): Add stress chaos experiments and split out the exec stress experiments (#368)
* Rename pod-cpu-hog and pod-memory-hog experiment with pod-cpu-hog-exec and pod-memory-hog-exec respectively

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Add stress chaos without execing into the target container

Signed-off-by: uditgaurav <udit@chaosnative.com>

* add helper logs

Signed-off-by: uditgaurav <udit@chaosnative.com>

* update the abort logic

Signed-off-by: udit <udit@chaosnative.com>

* refactor and update the code

Signed-off-by: udit <udit@chaosnative.com>

* update Dockerfile

Signed-off-by: udit <udit@chaosnative.com>

Co-authored-by: Shubham Chaudhary <shubham.chaudhary@mayadata.io>
2021-06-15 20:01:37 +05:30
Udit Gaurav b4bcbc0ed6
Chore(Dockerfile): Update Dockerfile to take binaries from test-tool release build (#380)
Signed-off-by: udit <udit@chaosnative.com>
2021-06-15 17:07:58 +05:30
Udit Gaurav 0fd14c2839
Chore(New_Exp): Add AWS SSM Chaos Experiment (#376)
* Chore(New_Exp): Add aws ssm chaos experiment

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Chore(New_Exp): Add AWS SSM Chaos Experiment

Signed-off-by: udit <udit@chaosnative.com>

* Add minor fix

Signed-off-by: udit <udit@chaosnative.com>

* update ENV name for cpu and default docs path

Signed-off-by: udit <udit@chaosnative.com>
2021-06-15 14:58:30 +05:30
VEDANT SHROTRIA e229c73348
Added docker-service-kill experiment implementation in litmus-go. (#379)
* Added docker-svc-kill implementation in litmus-go.

Signed-off-by: Jonsy13 <vedant.shrotria@chaosnative.com>
2021-06-12 16:59:47 +05:30
Shubham Chaudhary a0c5d5dc47
chore(experiment): Adding target details inside chaosresult for the experiments which doesn't contain helper pod (#341)
* chore(experiment): Adding target details inside chaosresult for the experiments which doesn't contain helper pod

Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>

* chore(experiment): Adding target details inside chaosresult for the pumba helper

Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-06-12 16:04:33 +05:30
Shubham Chaudhary b241c84edd
chore(experiment): Adding target details inside chaosresult for the experiments which contains helper pod (#342)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-06-11 18:46:24 +05:30
Shubham Chaudhary 51c3574d78
chore(pod-delete): Adding target details inside chaosresult (#336)
* chore(pod-delete): Adding target details inside chaosresult

Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>

* chore(pod-delete): Adding target details inside chaosresult for pod-autoscaler

Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-06-11 18:01:34 +05:30
Shubham Chaudhary 10de26b38d
chore(sdk): updating sdk (#378)
* chore(sdk): updating sdk

Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-06-11 11:00:46 +05:30
Shubham Chaudhary 86571de3f6
chore(chaosresult): updating verdict and status in chaosengine and chaosresult (#375)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-06-09 11:37:26 +05:30
Shubham Chaudhary e0a3102b94
chore(helper): Adding statusCheckTimeouts for the helper status check (#373)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-06-08 14:49:44 +05:30
Shubham Chaudhary c323334c61
chore(probe): adding probe abort (#370)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-06-01 21:36:56 +05:30
Udit Gaurav ba3deb2518
Chore(e2e): Update e2e workflows and add more node level tests (#372)
* Chore(e2e): Update e2e workflows and add more node level tests

Signed-off-by: uditgaurav <udit@chaosnative.com>

* add kind config

Signed-off-by: uditgaurav <udit@chaosnative.com>
2021-06-01 21:35:06 +05:30
Shubham Chaudhary 82837bc84f
chore(image): reduce the go-runner image size (#371)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-05-28 15:39:36 +05:30
Shubham Chaudhary b42a6676ed
rm(vendor): removing the vendor directory from litmus-go (#366)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-05-26 11:09:40 +05:30
Shubham Chaudhary 7f6b83b313
chore(env): : updated the env setter function (#365)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-05-26 10:37:26 +05:30
Snyk bot 956e949b01
fix: vendor/golang.org/x/net/http2/Dockerfile to reduce vulnerabilities (#364)
The following vulnerabilities are fixed with an upgrade:
- https://snyk.io/vuln/SNYK-UBUNTU1404-OPENSSL-1049144
- https://snyk.io/vuln/SNYK-UBUNTU1404-SUDO-1065770
- https://snyk.io/vuln/SNYK-UBUNTU1404-SUDO-406981
- https://snyk.io/vuln/SNYK-UBUNTU1404-SUDO-473059
- https://snyk.io/vuln/SNYK-UBUNTU1404-SUDO-546522

Co-authored-by: Karthik Satchitanand <karthik.s@mayadata.io>
2021-05-25 11:27:13 +05:30
Snyk bot 27df00ac11
fix: vendor/golang.org/x/net/http2/Dockerfile to reduce vulnerabilities (#361)
The following vulnerabilities are fixed with an upgrade:
- https://snyk.io/vuln/SNYK-UBUNTU1404-OPENSSL-1049144
- https://snyk.io/vuln/SNYK-UBUNTU1404-SUDO-1065770
- https://snyk.io/vuln/SNYK-UBUNTU1404-SUDO-406981
- https://snyk.io/vuln/SNYK-UBUNTU1404-SUDO-473059
- https://snyk.io/vuln/SNYK-UBUNTU1404-SUDO-546522

Co-authored-by: Karthik Satchitanand <karthik.s@mayadata.io>
2021-05-25 11:03:59 +05:30
Shubham Chaudhary 392549cd3f
chore(contribution): Adding contribution guide, bch check, issue & PR templates (#367)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-05-25 11:03:29 +05:30
Udit Gaurav 4f2e3e754c
Fix sequence env in kafka broker pod experiment (#369)
* Fix sequence env in kafka broker pod experiment

Signed-off-by: uditgaurav <udit@chaosnative.com>

* add pod affected percentage env

Signed-off-by: uditgaurav <udit@chaosnative.com>
2021-05-25 10:31:05 +05:30
Shubham Chaudhary d6d4797d4d
chore(dns): adding spoofmap env in helper pod (#363)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-05-17 10:24:08 +05:30
Shubham Chaudhary 80126c1d0f
refactor(chaoslibs): Refactored aws & vmware experiments (#353) (#359)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-05-15 16:56:38 +05:30
Shubham Chaudhary 78aeb39c3f
refactor(chaoslibs): Refactored aws & vmware experiments (#353)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-05-15 16:24:25 +05:30
Shubham Chaudhary 8441681a8b
refactor(chaoslibs): Refactored all the chaoslibs (#349)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-05-15 15:40:51 +05:30
Shubham Chaudhary 689ff65f97
refactor(chaoslibs): Refactored pumba chaoslibs and utils (#352)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-05-15 12:56:00 +05:30
Shubham Chaudhary 1bb936e461
refactor(chaoslibs): Refactored generic node experiments (#351)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-05-15 08:45:00 +05:30
Udit Gaurav 763a781732
Chore(abort): Add abort chaos support for kube-aws experiments (#347)
* Chore(abort): Add abort chaos support for kube-aws experiments

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Add abort signals

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Update abort logic and also include WaitForDown before the actual recovery

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Resolve conflict

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Update ec2-terminate-by-id.go

* Resolve Conflict

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Resolve Conflict

Signed-off-by: uditgaurav <udit@chaosnative.com>

* Make common abort for ebs

Signed-off-by: uditgaurav <udit@chaosnative.com>

Co-authored-by: Shubham Chaudhary <shubham.chaudhary@mayadata.io>
2021-05-14 22:50:41 +05:30
Udit Gaurav c10cde2595
Chore(New_exp): Add EBS Loss experiment using Tags (#354)
* Chore(New_exp): Add EBS Loss experiment using Tags

Signed-off-by: uditgaurav <udit@chaosnative.com>
2021-05-14 12:50:03 +05:30
Soumya Ghosh Dastidar 227599bf5f
Updated DNS chaos (#357)
* added dns spoof chaos

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>
2021-05-13 20:55:00 +05:30
Shubham Chaudhary 7d69572900
rm(README): Removed the outdated readme (#358)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-05-13 20:11:13 +05:30
Udit Gaurav 96435dd3b3
Chore(EC2-checks): Update the ec2 terminate with tags experiment to target only running instances (#350)
Signed-off-by: uditgaurav <udit@chaosnative.com>
2021-05-08 20:28:56 +05:30
Shubham Chaudhary 3a9997e2e8
chore(helper): updating the helper pod status check (#355)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-05-07 17:49:59 +05:30
Shubham Chaudhary 788b26f5b4
chore(stress): fixing 137 error in cpu & memory chaos (#339)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-04-30 21:45:37 +05:30
Shubham Chaudhary 1b2eeff520
chore(node-label): Adding ability to filter target node by label (#345)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-04-30 20:48:22 +05:30
Udit Gaurav 1bdee32753
Chore(update): Enhance pod-autoscaler logs and update helper pod name (#337)
* Chore(update): Enhance pod-autoscaler logs and update helper pod name

Signed-off-by: uditgaurav <udit@chaosnative.com>
2021-04-30 19:42:20 +05:30
iassurewipro 138914dba3
Core(New_exp): Adding VMWare VMPowerOff Experiment (#346)
* Core(New_exp): Adding VMWare VMPowerOff Experiment

Signed-off-by: DelphineJoyneer <golkonda.joyneer@wipro.com>

* Adding events for chaosinjection

Signed-off-by: DelphineJoyneer <golkonda.joyneer@wipro.com>

* Added abort and some minor fixes

Signed-off-by: DelphineJoyneer <golkonda.joyneer@wipro.com>

* Added a note for VSphere Version

Signed-off-by: DelphineJoyneer <golkonda.joyneer@wipro.com>

Co-authored-by: DelphineJoyneer <golkonda.joyneer@wipro.com>
2021-04-30 18:24:07 +05:30
Shubham Chaudhary 03a0854fc7
chore(exec): execing inside target pod only if it is in ready state (#343)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-04-28 17:53:01 +05:30
Shubham Chaudhary d0b64a8a86
chore(disk-fill): Adding Block size as tunable in disk-fill (#344)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-04-28 17:49:58 +05:30
Shubham Chaudhary 574edbbfa2
chore(node-restart): updating the node-restart experiment (#335)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-04-26 09:52:06 +05:30
Udit Gaurav de242b0d63
Chore(new_experiment): Split EC2 terminate experiment to select multiple instance using tags or list in different sequence modes (#327)
* Chore(ec2): Enhanc EC2 terminate experiment to select multiple instance using tags or list in different sequence modes

Signed-off-by: udit <udit@chaosnative.com>

* Add seperate experiments for ec2 terminate by id and by tags

Signed-off-by: uditgaurav <udit@chaosnative.com>

* update test.yml

Signed-off-by: uditgaurav <udit@chaosnative.com>

* update service account name in test.yml

Signed-off-by: uditgaurav <udit@chaosnative.com>
2021-04-15 21:50:41 +05:30
Karthik Satchitanand c4a0e3ff31
(enhancement)stress-image: add env for stress-image used in pumba lib (#331)
Signed-off-by: ksatchit <karthik.s@mayadata.io>
2021-04-15 21:01:30 +05:30
Shubham Chaudhary f2aed85e9d
fix(logging): modifying the Fatal to Error (#330)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2021-04-15 21:01:10 +05:30
Shubham Chaudhary 71b0eeae0a
update(sdk): updating the sdk scaffold (#329)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-04-15 18:52:09 +05:30
Soumya Ghosh Dastidar c79b9ea0fd
Added Pod DNS Chaos (#328)
* Added DNS Chaos

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

* Added termination grace period env

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

* Updated abort watcher

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>

* added change requests

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>
2021-04-15 09:54:01 +05:30
Shubham Chaudhary 33ad12502c
feat(revert): Adding chaos revert steps on experiment abortion (#318)
* feat(revert): Adding chaos revert steps for experiment abortion

Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>

* chore(revert): Adding revert logic inside pod-cpu-hog experiment

Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>

Co-authored-by: Karthik Satchitanand <karthik.s@mayadata.io>
Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
2021-04-12 09:35:53 +05:30
Shubham Chaudhary 34443dcb63
chore(helper): fixing waitForCompletion function to handle failed helpers in case of multiple targets (#326)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>

Co-authored-by: Karthik Satchitanand <karthik.s@mayadata.io>
2021-04-10 13:22:41 +05:30
Udit Gaurav e0af51360c
Chore(stress): Add stress envs for stress supported experiments (#325)
* Chore(stress): Add stress envs for stress supported experiments

Signed-off-by: udit <udit@chaosnative.com>

* Correct lint failure

Signed-off-by: udit <udit@chaosnative.com>

* Update environment.go

Co-authored-by: udit <udit@chaosnative.com>
2021-04-10 12:11:59 +05:30
Shubham Chaudhary fcb4a8c76f
chore(probe): stop failing of other probes of same phase if one probe fails (#324)
* chore(probe): stop failing of other probes of same phase if one probe fails

Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>

* updating the chaos-operator vendors

Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-04-07 16:48:52 +05:30
Udit Gaurav a7481fad95
Chore(actions): Update GitHub Chaos Actions (#320)
Signed-off-by: udit <udit@chaosnative.com>
2021-04-07 15:08:20 +05:30
Shubham Chaudhary 662230cc2e
rm(network-latency): Removed duplicate network-latency experiment (#322)
* rm(network-letency): Removed duplicate network-latency experiment

Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-04-07 15:05:40 +05:30
Shubham Chaudhary 8d7cee05f2
feat(randomness): Adding randomness interval inside pod-delete experiment (#321)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-03-25 23:21:29 +05:30
Shubham Chaudhary 1ccaee37e1
chore(applabels): Erroring out if applabel is not provided and annotationCheck is false (#319)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-03-25 20:49:41 +05:30
Udit Gaurav 7020ce59f2
enhance(aws-ec2): Add support for terminating node of a cluster with self-managed nodegroup (#298)
* enhanc(aws-ec2): Add support for terminating node of a cluster with self-managed nodegroup

Signed-off-by: udit <udit@chaosnative.com>
2021-03-15 16:23:19 +05:30
Shubham Chaudhary b8f5d70c63
chore(container-status): checking only target container status (#303)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-03-15 14:19:00 +05:30
OUM NIVRATHI KALE e659d0e6d8
api response time updated for http probe (#307)
Signed-off-by: oumkale <oum.kale@mayadata.io>
2021-03-15 13:45:38 +05:30
Shubham Chaudhary 7273dc258b
chore(k8sProbe): Updating the k8s probe schema (#308)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-03-15 12:08:25 +05:30
Radu Domnu 5975e572f8
Remove init container for changing permission of the container runtime socket (#315)
* Removed init container from container-kill and network-chaos experiments

Signed-off-by: Radu Domnu <rdomnu@redhat.com>

* Running crictl/docker commands with sudo

Signed-off-by: Radu Domnu <rdomnu@redhat.com>

* Removed init container for network experiments

Signed-off-by: Radu Domnu <rdomnu@redhat.com>
2021-03-15 12:04:47 +05:30
Shubham Chaudhary c049b15639
refactor(kafka-broker-pod-failure): Refactor the kafka broker pod failure (#309)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-03-15 12:03:46 +05:30
Shubham Chaudhary 524f1dfc1f
chore(disk-fill): Addding option to specify ephemeral storage(Mibi) explictly via env (#313)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-03-15 12:03:25 +05:30
Shubham Chaudhary 587fd107e6
chore(signal): Adding signal for the crio/containerd runtime (#306)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-03-08 21:12:34 +05:30
Shubham Chaudhary 4fc421d658
chore(n/w-chaos): Handling the unknown hosts (#302)
Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>
2021-03-05 12:10:07 +05:30
Shubham Chaudhary 8e0d5858df
chore(aut-check): skip autStatus check if appinfo not provided (#304)
Signed-off-by: shubhamchaudhary <shubham@chaosnative.com>
2021-03-04 12:38:31 +05:30
Shubham Chaudhary 84b91467be
feat(abort): Adding chaos revert inside network-chaos (#297)
Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>
2021-02-24 21:28:01 +05:30
Shubham Chaudhary 8fc2d4d9b6
refactor(comparator): refactor the comparator and added oneof operator (#283)
Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>
2021-02-22 11:56:48 +05:30
Shubham Chaudhary 2b384d7013
feat(status): Checking status of annotated applications only (#293)
Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>
2021-02-22 11:53:43 +05:30
Shubham Chaudhary e39d5972c6
feat(probe): Adding onchaos mode in all experiments (#292)
Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>
2021-02-21 04:17:59 +05:30
Shubham Chaudhary 41e8cc095a
feat(imagePullSecrets): Passing imagePullSecrets into helper pod (#295)
Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>
2021-02-21 04:15:55 +05:30
Shubham Chaudhary 6728a5b798
chore(abort): updating termination grace period (#290)
Signed-off-by: shubhamchaudhary <shubham.chaudhary@mayadata.io>
2021-02-20 11:58:42 +05:30
Udit Gaurav 83d9bfe476
Fix docker buildx command in CI (#291)
Signed-off-by: udit <udit.gaurav@mayadata.io>
2021-02-20 10:15:37 +05:30
2630 changed files with 37380 additions and 954109 deletions

27
.github/ISSUE_TEMPLATE.md vendored Normal file
View File

@ -0,0 +1,27 @@
<!-- This form is for bug reports and feature requests ONLY! -->
<!-- Thanks for filing an issue! Before hitting the button, please answer these questions.-->
## Is this a BUG REPORT or FEATURE REQUEST?
Choose one: BUG REPORT or FEATURE REQUEST
<!--
If this is a BUG REPORT, please:
- Fill in as much of the template below as you can. If you leave out information, we can't help you as well.
If this is a FEATURE REQUEST, please:
- Describe *in detail* the feature/behavior/change you'd like to see.
In both cases, be ready for followup questions, and please respond in a timely
manner. If we can't reproduce a bug or think a feature already exists, we
might close your issue. If we're wrong, PLEASE feel free to reopen it and
explain why.
-->
**What happened**:
**What you expected to happen**:
**How to reproduce it (as minimally and precisely as possible)**:
**Anything else we need to know?**:

18
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@ -0,0 +1,18 @@
<!-- Thanks for sending a pull request! Here are some tips for you -->
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Checklist:**
- [ ] Fixes #<issue number>
- [ ] PR messages has document related information
- [ ] Labelled this PR & related issue with `breaking-changes` tag
- [ ] PR messages has breaking changes related information
- [ ] Labelled this PR & related issue with `requires-upgrade` tag
- [ ] PR messages has upgrade related information
- [ ] Commit has unit tests
- [ ] Commit has integration tests
- [ ] E2E run Required for the changes

23
.github/auto-merge.yml vendored Normal file
View File

@ -0,0 +1,23 @@
# Configuration for probot-auto-merge - https://github.com/bobvanderlinden/probot-auto-merge
reportStatus: true
updateBranch: false
deleteBranchAfterMerge: true
mergeMethod: squash
minApprovals:
COLLABORATOR: 0
maxRequestedChanges:
NONE: 0
blockingLabels:
- DO NOT MERGE
- WIP
- blocked
# Will merge whenever the above conditions are met, but also
# the owner has approved or merge label was added.
rules:
- minApprovals:
OWNER: 1
- requiredLabels:
- merge

View File

@ -6,52 +6,54 @@ on:
types: [opened, synchronize, reopened]
jobs:
lint:
pre-checks:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '^1.13.1'
go-version: '1.20'
# Setup gopath
- name: Setting up GOPATH
run: |
echo "GOPATH=${GITHUB_WORKSPACE}/go" >> $GITHUB_ENV
# Checkout to the latest commit
# On specific directory/path
- uses: actions/checkout@v2
with:
path: go/src/github.com/${{github.repository}}
ref: ${{ github.event.pull_request.head.sha }}
#TODO: Add Dockerfile linting
# Running go-lint
- name: Checking Go-Lint
run : |
sudo apt-get update && sudo apt-get install golint
cd go/src/github.com/${{github.repository}}
make gotasks
- name: gofmt check
run: |
if [ "$(gofmt -s -l . | wc -l)" -ne 0 ]
then
echo "The following files were found to be not go formatted:"
gofmt -s -l .
exit 1
fi
- name: golangci-lint
uses: reviewdog/action-golangci-lint@v1
gitleaks-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Run GitLeaks
run: |
wget https://github.com/gitleaks/gitleaks/releases/download/v8.18.2/gitleaks_8.18.2_linux_x64.tar.gz && \
tar -zxvf gitleaks_8.18.2_linux_x64.tar.gz && \
sudo mv gitleaks /usr/local/bin && gitleaks detect --source . -v
build:
needs: pre-checks
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '^1.13.1'
go-version: '1.20'
# Setup gopath
- name: Setting up GOPATH
run: |
echo "GOPATH=${GITHUB_WORKSPACE}/go" >> $GITHUB_ENV
# Checkout to the latest commit
# On specific directory/path
- uses: actions/checkout@v2
with:
path: go/src/github.com/${{github.repository}}
ref: ${{ github.event.pull_request.head.sha }}
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
@ -64,23 +66,33 @@ jobs:
with:
version: latest
- name: Build Docker Image
env:
DOCKER_REPO: litmuschaos
DOCKER_IMAGE: go-runner
DOCKER_TAG: ci
run: |
cd go/src/github.com/${{github.repository}}
make build
- name: Build and push
uses: docker/build-push-action@v2
with:
push: false
file: build/Dockerfile
platforms: linux/amd64,linux/arm64
tags: litmuschaos/go-runner:ci
build-args: LITMUS_VERSION=3.10.0
trivy:
trivy:
needs: pre-checks
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: setup trivy
- name: Build an image from Dockerfile
run: |
wget https://github.com/aquasecurity/trivy/releases/download/v0.11.0/trivy_0.11.0_Linux-64bit.tar.gz
tar zxvf trivy_0.11.0_Linux-64bit.tar.gz
make trivy-check
docker build -f build/Dockerfile -t docker.io/litmuschaos/go-runner:${{ github.sha }} . --build-arg TARGETARCH=amd64 --build-arg LITMUS_VERSION=3.10.0
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: 'docker.io/litmuschaos/go-runner:${{ github.sha }}'
format: 'table'
exit-code: '1'
ignore-unfixed: true
vuln-type: 'os,library'
severity: 'CRITICAL,HIGH'

View File

@ -1,182 +0,0 @@
# Run E2E tests using GitHub Chaos Actions
- When you commit code to your repository, you can continuously build and test the code to make sure that the commit doesn't introduce errors. The error could be in the form of some security issue, functional issue, or performance issue which can be tested using different custom tests, linters, or by pulling actions. This brings the need of having *Chaos Actions* which will perform a chaos test on the application over a particular commit which in-turn helps to track the performance of the application on a commit level. This can be done by commenting on the Pull Request.
The E2E action will run:
- On Pull Request added or updated (automated)
- Through comments on PR (user-defined)
## On Pull Request added or updated
We can the run e2e on a PR according to the commit message. Here are the syntax to run e2e on commit message.
**Note:**_ Your commit message should contain the following Keywords to run the test. By default it will run all tests available.
<table>
<tr>
<th>Keywords</th>
<th>Details</th>
</tr>
<tr>
<th><code>[skip ci]</code></th>
<td>The job skip no test will run.</td>
</tr>
<tr>
<th><code>'[Run CI]'</code></th>
<td>This will run all the test for all the experiment available</td>
</tr>
<tr>
<th><code>'[Network Chaos]'</code></th>
<td>This will run all the test available for network experiment</td>
</tr>
<tr>
<th><code>'[Resource Chaos]'</code></th>
<td>This will run all the test available for resource experiment</td>
</tr>
<tr>
<th><code>'[IO Chaos]'</code></th>
<td>This will run all the test available for io experiment</td>
</tr>
<tr>
<th><code>'[Scale Chaos]'</code></th>
<td>This will run all the test available for scale experiment</td>
</tr>
<tr>
<th><code>'[Pod Delete]'</code></th>
<td>This will run the pod delete test.</td>
</tr>
<tr>
<th><code>'[Container Kill]'</code></th>
<td>This will run the container kill test.</td>
</tr>
</table>
## Through comments on PR
- We can run tests for any desired experiment or set of experiments by just commenting on the Pull Request. The format of comment will be:
```bash
/run-e2e-<test-name/test-group>
```
_Experiments Available for custom bot:_
<table style="width:100%">
<tr>
<th>Resource chaos</th>
<th>Network Chaos</th>
<th>IO Chaos</th>
<th>Others</th>
</tr>
<tr>
<td>pod-cpu-hog</td>
<td>pod-network-latency</td>
<td>node-io-stress</td>
<td>pod-delete</td>
</tr>
<tr>
<td>pod-memory-hog</td>
<td>pod-network-loss</td>
<td></td>
<td>container-kill</td>
</tr>
<tr>
<td>node-cpu-hog</td>
<td>pod-network-corruption</td>
<td></td>
<td>pod-autoscaler</td>
</tr>
<tr>
<td>node-memory-hog</td>
<td>pod-network-duplication</td>
<td></td>
<td></td>
</tr>
</table>
### Group Tests
<table style="width:100%">
<tr>
<th>Command</th>
<th>Description</th>
</tr>
<tr>
<td><code>/run-e2e-all</code></td>
<td>Runs all available tests. This includes all resource chaos test, network chaos test, IO test and other tests. It will update the comment if it gets passed.</td>
</tr>
<tr>
<td><code>/run-e2e-network-chaos</code></td>
<td>Runs all network chaos tests. This includes pod network corruption, pod network duplication, pod network loss, pod network latency.</td>
</tr>
<tr>
<td><code>/run-e2e-resource-chaos</code></td>
<td>Runs all resource chaos tests. This includes pod level cpu and memory chaos test and node level cpu and memory chaos test.</td>
</tr>
<tr>
<td><code>/run-e2e-io-chaos</code></td>
<td>Runs all io chaos tests. Currently it only includes node io stress</td>
</tr>
</table>
### Individual Tests
<table style="width:100%">
<tr>
<th>Command</th>
<th>Description</th>
</tr>
<tr>
<td><code>/run-e2e-pod-delete</code></td>
<td>Runs pod delete chaos test using GitHub chaos action which fail the application pod</td>
</tr>
<tr>
<td><code>/run-e2e-container-kill</code></td>
<td>Runs container kill experiment using GitHub chaos action which kill containers on the application pod</td>
</tr>
<tr>
<td><code>/run-e2e-pod-cpu-hog</code></td>
<td>Runs pod level CPU chaos experiment using GitHub chaos action which consume CPU resources on the application container</td>
</tr>
<tr>
<td><code>/run-e2e-pod-memory-hog</code></td>
<td>Runs pod level memory chaos test which consume memory resources on the application container</td>
</tr>
<tr>
<td><code>/run-e2e-node-cpu-hog</code></td>
<td>Runs node level cpu chaos test which exhaust CPU resources on the Kubernetes Node </td>
</tr>
<tr>
<td><code>/run-e2e-node-memory-hog</code></td>
<td>Runs node level memory chaos test which exhaust CPU resources on the Kubernetes Node</td>
</tr>
<tr>
<td><code>/run-e2e-node-io-stress</code></td>
<td>Runs pod level memory chaos test which gives IO stress on the Kubernetes Node </td>
</tr>
<tr>
<td><code>/run-e2e-pod-network-corruption<code></td>
<td>Run pod-network-corruption test which inject network packet corruption into application pod</td>
</tr>
<tr>
<td><code>/run-e2e-pod-network-latency</code></td>
<td>Run pod-network-latency test which inject network packet latency into application pod </td>
</tr>
<tr>
<td><code>/run-e2e-pod-network-loss</code></td>
<td>Run pod-network-loss test which inject network packet loss into application pod </td>
</tr>
<tr>
<td><code>/run-e2e-pod-network-duplication</code></td>
<td>Run pod-network-duplication test which inject network packet duplication into application pod </td>
</tr>
</table>
***Note:*** *All the tests are performed on a KinD cluster with containerd runtime.*

View File

@ -6,54 +6,38 @@ on:
- master
tags-ignore:
- '**'
jobs:
lint:
pre-checks:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '^1.13.1'
# Setup gopath
- name: Setting up GOPATH
run: |
echo "GOPATH=${GITHUB_WORKSPACE}/go" >> $GITHUB_ENV
# Checkout to the latest commit
# On specific directory/path
go-version: '1.20'
- uses: actions/checkout@v2
with:
path: go/src/github.com/${{github.repository}}
#TODO: Add Dockerfile linting
# Running go-lint
- name: Checking Go-Lint
run : |
sudo apt-get update && sudo apt-get install golint
cd go/src/github.com/${{github.repository}}
make gotasks
- name: gofmt check
run: |
if [ "$(gofmt -s -l . | wc -l)" -ne 0 ]
then
echo "The following files were found to be not go formatted:"
gofmt -s -l .
exit 1
fi
- name: golangci-lint
uses: reviewdog/action-golangci-lint@v1
push:
needs: pre-checks
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '^1.13.1'
# Setup gopath
- name: Setting up GOPATH
run: |
echo "GOPATH=${GITHUB_WORKSPACE}/go" >> $GITHUB_ENV
# Checkout to the latest commit
# On specific directory/path
go-version: '1.20'
- uses: actions/checkout@v2
with:
path: go/src/github.com/${{github.repository}}
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
@ -66,34 +50,17 @@ jobs:
with:
version: latest
- name: Build Docker Image
env:
DOCKER_REPO: litmuschaos
DOCKER_IMAGE: go-runner
DOCKER_TAG: ci
run: |
cd go/src/github.com/${{github.repository}}
make experiment-build
- name: Login to Docker Hub
uses: docker/login-action@v1
with:
username: ${{ secrets.DNAME }}
password: ${{ secrets.DPASS }}
- name: Push Docker Image
env:
DOCKER_REPO: litmuschaos
DOCKER_IMAGE: go-runner
DOCKER_TAG: ci
DNAME: ${{ secrets.DNAME }}
DPASS: ${{ secrets.DPASS }}
run: |
cd go/src/github.com/${{github.repository}}
make push
trivy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: setup trivy
run: |
wget https://github.com/aquasecurity/trivy/releases/download/v0.11.0/trivy_0.11.0_Linux-64bit.tar.gz
tar zxvf trivy_0.11.0_Linux-64bit.tar.gz
make trivy-check
- name: Build and push
uses: docker/build-push-action@v2
with:
push: true
file: build/Dockerfile
platforms: linux/amd64,linux/arm64
tags: litmuschaos/go-runner:ci
build-args: LITMUS_VERSION=3.10.0

View File

@ -6,52 +6,24 @@ on:
- '**'
jobs:
lint:
pre-checks:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '^1.13.1'
# Setup gopath
- name: Setting up GOPATH
run: |
echo "GOPATH=${GITHUB_WORKSPACE}/go" >> $GITHUB_ENV
# Checkout to the latest commit
# On specific directory/path
go-version: '1.20'
- uses: actions/checkout@v2
with:
path: go/src/github.com/${{github.repository}}
#TODO: Add Dockerfile linting
# Running go-lint
- name: Checking Go-Lint
run : |
sudo apt-get update && sudo apt-get install golint
cd go/src/github.com/${{github.repository}}
make gotasks
push:
needs: pre-checks
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '^1.13.1'
# Setup gopath
- name: Setting up GOPATH
run: |
echo "GOPATH=${GITHUB_WORKSPACE}/go" >> $GITHUB_ENV
# Checkout to the latest commit
# On specific directory/path
go-version: '1.20'
- uses: actions/checkout@v2
with:
path: go/src/github.com/${{github.repository}}
- name: Set Tag
run: |
@ -63,7 +35,7 @@ jobs:
run: |
echo "RELEASE TAG: ${RELEASE_TAG}"
echo "${RELEASE_TAG}" > ${{ github.workspace }}/tag.txt
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
with:
@ -75,36 +47,19 @@ jobs:
with:
version: latest
- name: Build Docker Image
- name: Login to Docker Hub
uses: docker/login-action@v1
with:
username: ${{ secrets.DNAME }}
password: ${{ secrets.DPASS }}
- name: Build and push
uses: docker/build-push-action@v2
env:
DOCKER_REPO: litmuschaos
DOCKER_IMAGE: go-runner
DOCKER_TAG: ${RELEASE_TAG}
DNAME: ${{ secrets.DNAME }}
DPASS: ${{ secrets.DPASS }}
run: |
cd go/src/github.com/${{github.repository}}
make experiment-build
- name: Push Docker Image
env:
DOCKER_REPO: litmuschaos
DOCKER_IMAGE: go-runner
DOCKER_TAG: ${RELEASE_TAG}
DNAME: ${{ secrets.DNAME }}
DPASS: ${{ secrets.DPASS }}
run: |
cd go/src/github.com/${{github.repository}}
make push
trivy:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: setup trivy
run: |
wget https://github.com/aquasecurity/trivy/releases/download/v0.11.0/trivy_0.11.0_Linux-64bit.tar.gz
tar zxvf trivy_0.11.0_Linux-64bit.tar.gz
make trivy-check
RELEASE_TAG: ${{ env.RELEASE_TAG }}
with:
push: true
file: build/Dockerfile
platforms: linux/amd64,linux/arm64
tags: litmuschaos/go-runner:${{ env.RELEASE_TAG }},litmuschaos/go-runner:latest
build-args: LITMUS_VERSION=3.10.0

View File

@ -1,396 +0,0 @@
name: E2E
on:
pull_request:
branches: [master]
types: [edited, opened, synchronize, reopened]
jobs:
Generic_Tests:
if: "!contains(github.event.head_commit.message, '[skip ci]')"
runs-on: ubuntu-latest
steps:
#Using the last commit id of pull request
- uses: octokit/request-action@v2.x
id: get_PR_commits
with:
route: GET /repos/:repo/pulls/:pull_number/commits
repo: ${{ github.repository }}
pull_number: ${{ github.event.number }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: set commit to output
id: getcommit
run: |
prsha=$(echo $response | jq '.[-1].sha' | tr -d '"')
echo "::set-output name=sha::$prsha"
env:
response: ${{ steps.get_PR_commits.outputs.data }}
- uses: actions/checkout@v2
with:
ref: ${{steps.getcommit.outputs.sha}}
- name: Generating Go binary and Building docker image
run: |
make build-amd64
#Install and configure a kind cluster
- name: Installing Prerequisites (KinD Cluster)
uses: engineerd/setup-kind@v0.5.0
with:
version: "v0.7.0"
- name: Configuring and testing the Installation
run: |
kubectl cluster-info --context kind-kind
kind get kubeconfig --internal >$HOME/.kube/config
kubectl get nodes
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
- name: Deploy a sample application for chaos injection
run: |
kubectl apply -f https://raw.githubusercontent.com/litmuschaos/chaos-ci-lib/master/app/nginx.yml
sleep 30
- name: Setting up kubeconfig ENV for Github Chaos Action
run: echo ::set-env name=KUBE_CONFIG_DATA::$(base64 -w 0 ~/.kube/config)
env:
ACTIONS_ALLOW_UNSECURE_COMMANDS: true
- name: Setup Litmus
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
INSTALL_LITMUS: true
- name: Check the commit messgae
if: |
contains(github.event.head_commit.message, '[Pod Delete]') || contains(github.event.head_commit.message, '[Container Kill]') ||
contains(github.event.head_commit.message, '[Network Chaos]') || contains(github.event.head_commit.message, '[Resource Chaos]') ||
contains(github.event.head_commit.message, '[Scale Chaos]') || contains(github.event.head_commit.message, '[IO Chaos]') ||
contains(github.event.head_commit.message, '[Run CI]')
run: |
echo ::set-env name=TEST_RUN::true
env:
ACTIONS_ALLOW_UNSECURE_COMMANDS: true
- name: Running Litmus pod delete chaos experiment
if: "contains(github.event.head_commit.message, '[Pod Delete]') || contains(github.event.head_commit.message, '[Run CI]') || env.TEST_RUN != 'true'"
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: pod-delete
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
- name: Running container kill chaos experiment
if: "contains(github.event.head_commit.message, '[Container Kill]') || contains(github.event.head_commit.message, '[Run CI]') || env.TEST_RUN != 'true'"
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: container-kill
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
CONTAINER_RUNTIME: containerd
- name: Running node-cpu-hog chaos experiment
if: "contains(github.event.head_commit.message, '[Resource Chaos]') || contains(github.event.head_commit.message, '[Run CI]') || env.TEST_RUN != 'true'"
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: node-cpu-hog
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
- name: Running node-memory-hog chaos experiment
if: "contains(github.event.head_commit.message, '[Resource Chaos]') || contains(github.event.head_commit.message, '[Run CI]') || env.TEST_RUN != 'true'"
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: node-memory-hog
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
- name: Running pod-cpu-hog chaos experiment
if: "contains(github.event.head_commit.message, '[Resource Chaos]') || contains(github.event.head_commit.message, '[Run CI]') || env.TEST_RUN != 'true'"
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: pod-cpu-hog
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
CPU_CORES: 1
- name: Running pod-memory-hog chaos experiment
if: "contains(github.event.head_commit.message, '[Resource Chaos]') || contains(github.event.head_commit.message, '[Run CI]') || env.TEST_RUN != 'true'"
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: pod-memory-hog
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
MEMORY_CONSUMPTION: 500
CHAOS_KILL_COMMAND: "kill -9 $(ps afx | grep \"[dd] if /dev/zero\" | awk '{print$1}' | tr '\\n' ' ')"
- name: Running pod network corruption chaos experiment
if: "contains(github.event.head_commit.message, '[Network Chaos]') || contains(github.event.head_commit.message, '[Run CI]') || env.TEST_RUN != 'true'"
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: pod-network-corruption
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
NETWORK_INTERFACE: eth0
CONTAINER_RUNTIME: containerd
- name: Running pod network duplication chaos experiment
if: "contains(github.event.head_commit.message, '[Network Chaos]') || contains(github.event.head_commit.message, '[Run CI]') || env.TEST_RUN != 'true'"
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: pod-network-duplication
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
NETWORK_INTERFACE: eth0
CONTAINER_RUNTIME: containerd
- name: Running pod-network-latency chaos experiment
if: "contains(github.event.head_commit.message, '[Network Chaos]') || contains(github.event.head_commit.message, '[Run CI]') || env.TEST_RUN != 'true'"
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: pod-network-latency
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
NETWORK_INTERFACE: eth0
NETWORK_LATENCY: 60000
CONTAINER_RUNTIME: containerd
- name: Running pod-network-loss chaos experiment
if: "contains(github.event.head_commit.message, '[Network Chaos]') || contains(github.event.head_commit.message, '[Run CI]') || env.TEST_RUN != 'true'"
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: pod-network-loss
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
NETWORK_INTERFACE: eth0
NETWORK_PACKET_LOSS_PERCENTAGE: 100
CONTAINER_RUNTIME: containerd
- name: Running pod autoscaler chaos experiment
if: "contains(github.event.head_commit.message, '[Scale Chaos]') || contains(github.event.head_commit.message, '[Run CI]') || env.TEST_RUN != 'true'"
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: pod-autoscaler
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
TOTAL_CHAOS_DURATION: 60
- name: Running node-io-stress chaos experiment
if: "contains(github.event.head_commit.message, '[IO Chaos]') || contains(github.event.head_commit.message, '[Run CI]') || env.TEST_RUN != 'true'"
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: node-io-stress
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
TOTAL_CHAOS_DURATION: 120
FILESYSTEM_UTILIZATION_PERCENTAGE: 10
- name: Check for all the jobs are succeeded
if: ${{ success() && env.TEST_RUN == 'true' }}
run: echo "All tests Are passed"
- name: Check for any job failed
if: ${{ failure() }}
run: echo "Some tests are failing please check..."
- name: Uninstall Litmus
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
LITMUS_CLEANUP: true
- name: Deleting KinD cluster
if: ${{ always() }}
run: kind delete cluster
Pod_Level_In_Series_Mode:
if: "!contains(github.event.head_commit.message, '[skip ci]')"
runs-on: ubuntu-latest
steps:
#Using the last commit id of pull request
- uses: octokit/request-action@v2.x
id: get_PR_commits
with:
route: GET /repos/:repo/pulls/:pull_number/commits
repo: ${{ github.repository }}
pull_number: ${{ github.event.number }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: set commit to output
id: getcommit
run: |
prsha=$(echo $response | jq '.[-1].sha' | tr -d '"')
echo "::set-output name=sha::$prsha"
env:
response: ${{ steps.get_PR_commits.outputs.data }}
# Checkout to the latest commit
# On specific directory/path
- uses: actions/checkout@v2
with:
ref: ${{steps.getcommit.outputs.sha}}
path: go/src/github.com/${{github.repository}}
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '^1.13.1'
# Setup gopath
- name: Setting up GOPATH
run: |
echo "GOPATH=${GITHUB_WORKSPACE}/go" >> $GITHUB_ENV
- name: Generating Go binary and Building docker image
run: |
cd ${GOPATH}/src/github.com/${{github.repository}}
make build-amd64
#Install and configure a kind cluster
- name: Installing Prerequisites (K3S Cluster)
env:
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
run: |
curl -sfL https://get.k3s.io | sh -s - --docker --write-kubeconfig-mode 664
sleep 30
kubectl get nodes
- name: Cloning Litmus E2E Repo
run: |
cd ${GOPATH}/src/github.com/${{github.repository}}
git clone https://github.com/litmuschaos/litmus-e2e.git -b generic
- name: Running Pod level experiment with affected percentage 100 and in series mode
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
run: |
cd ${GOPATH}/src/github.com/${{github.repository}}/litmus-e2e
make build-litmus
make app-deploy
make pod-affected-perc-ton-series
kubectl describe chaosexperiment -n litmus
- name: Deleting K3S cluster
if: ${{ always() }}
run: /usr/local/bin/k3s-uninstall.sh
Pod_Level_In_Parallel_Mode:
if: "!contains(github.event.head_commit.message, '[skip ci]')"
runs-on: ubuntu-latest
steps:
#Using the last commit id of pull request
- uses: octokit/request-action@v2.x
id: get_PR_commits
with:
route: GET /repos/:repo/pulls/:pull_number/commits
repo: ${{ github.repository }}
pull_number: ${{ github.event.number }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: set commit to output
id: getcommit
run: |
prsha=$(echo $response | jq '.[-1].sha' | tr -d '"')
echo "::set-output name=sha::$prsha"
env:
response: ${{ steps.get_PR_commits.outputs.data }}
# Checkout to the latest commit
# On specific directory/path
- uses: actions/checkout@v2
with:
ref: ${{steps.getcommit.outputs.sha}}
path: go/src/github.com/${{github.repository}}
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '^1.13.1'
# Setup gopath
- name: Setting up GOPATH
run: |
echo "GOPATH=${GITHUB_WORKSPACE}/go" >> $GITHUB_ENV
- name: Generating Go binary and Building docker image
run: |
cd ${GOPATH}/src/github.com/${{github.repository}}
make build-amd64
#Install and configure a kind cluster
- name: Installing Prerequisites (K3S Cluster)
env:
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
run: |
curl -sfL https://get.k3s.io | sh -s - --docker --write-kubeconfig-mode 664
sleep 30
kubectl get nodes
- name: Cloning Litmus E2E Repo
run: |
cd ${GOPATH}/src/github.com/${{github.repository}}
git clone https://github.com/litmuschaos/litmus-e2e.git -b generic
- name: Running Pod level experiment with affected percentage 100 and in parallel mode
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
run: |
cd ${GOPATH}/src/github.com/${{github.repository}}/litmus-e2e
make build-litmus
make app-deploy
make pod-affected-perc-ton-parallel
- name: Deleting K3S cluster
if: ${{ always() }}
run: /usr/local/bin/k3s-uninstall.sh

View File

@ -1,384 +0,0 @@
name: LitmusGo-CI
on:
issue_comment:
types: [created]
jobs:
tests:
if: contains(github.event.comment.html_url, '/pull/') && startsWith(github.event.comment.body, '/run-e2e')
runs-on: ubuntu-latest
steps:
- name: Notification for e2e Start
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
**Test Status:** The e2e test has been started please wait for the results ...
****
| Experiment | Result | Runtime |
|------------|--------|---------|
#Using the last commit id of pull request
- uses: octokit/request-action@v2.x
id: get_PR_commits
with:
route: GET /repos/:repo/pulls/:pull_number/commits
repo: ${{ github.repository }}
pull_number: ${{ github.event.issue.number }}
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: set commit to output
id: getcommit
run: |
prsha=$(echo $response | jq '.[-1].sha' | tr -d '"')
echo "::set-output name=sha::$prsha"
env:
response: ${{ steps.get_PR_commits.outputs.data }}
- uses: actions/checkout@v2
with:
ref: ${{steps.getcommit.outputs.sha}}
- name: Generating Go binary and Building docker image
run: |
make build-amd64
#Install and configure a kind cluster
- name: Installing Prerequisites (KinD Cluster)
uses: engineerd/setup-kind@v0.5.0
with:
version: "v0.7.0"
- name: Configuring and testing the Installation
run: |
kubectl cluster-info --context kind-kind
kind get kubeconfig --internal >$HOME/.kube/config
kubectl get nodes
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
- name: Deploy a sample application for chaos injection
run: |
kubectl apply -f https://raw.githubusercontent.com/litmuschaos/chaos-ci-lib/master/app/nginx.yml
sleep 30
- name: Setting up kubeconfig ENV for Github Chaos Action
run: echo ::set-env name=KUBE_CONFIG_DATA::$(base64 -w 0 ~/.kube/config)
env:
ACTIONS_ALLOW_UNSECURE_COMMANDS: true
- name: Setup Litmus
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
INSTALL_LITMUS: true
- name: Running Litmus pod delete chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-pod-delete') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: pod-delete
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
- name: Update pod delete result
if: startsWith(github.event.comment.body, '/run-e2e-pod-delete') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Pod Delete | Pass | containerd |
- name: Running container kill chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-container-kill') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: container-kill
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
CONTAINER_RUNTIME: containerd
- name: Update container-kill result
if: startsWith(github.event.comment.body, '/run-e2e-container-kill') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Container Kill | Pass | containerd |
- name: Running node-cpu-hog chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-node-cpu-hog') || startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: node-cpu-hog
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
- name: Update node-cpu-hog result
if: startsWith(github.event.comment.body, '/run-e2e-node-cpu-hog') || startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Node CPU Hog | Pass | containerd |
- name: Running node-memory-hog chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-node-memory-hog') || startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: node-memory-hog
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
- name: Update node-memory-hog result
if: startsWith(github.event.comment.body, '/run-e2e-node-memory-hog') || startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Node MEMORY Hog | Pass | containerd |
- name: Running pod-cpu-hog chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-pod-cpu-hog') || startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: pod-cpu-hog
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
CPU_CORES: 1
- name: Update pod-cpu-hog result
if: startsWith(github.event.comment.body, '/run-e2e-pod-cpu-hog') || startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Pod CPU Hog | Pass | containerd |
- name: Running pod-memory-hog chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-pod-memory-hog') || startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: pod-cpu-hog
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
MEMORY_CONSUMPTION: 500
- name: Update pod-memory-hog result
if: startsWith(github.event.comment.body, '/run-e2e-pod-memory-hog') || startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Pod Memory Hog | Pass | containerd |
- name: Running pod network corruption chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-pod-network-corruption') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: pod-network-corruption
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
NETWORK_INTERFACE: eth0
CONTAINER_RUNTIME: containerd
- name: Update pod-network-corruption result
if: startsWith(github.event.comment.body, '/run-e2e-pod-network-corruption') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Pod Network Corruption | Pass | containerd |
- name: Running pod network duplication chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-pod-network-duplication') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: pod-network-duplication
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
NETWORK_INTERFACE: eth0
CONTAINER_RUNTIME: containerd
- name: Update pod-network-duplication result
if: startsWith(github.event.comment.body, '/run-e2e-pod-network-duplication') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Pod Network Duplication | Pass | containerd |
- name: Running pod-network-latency chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-pod-network-latency') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: pod-network-latency
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
NETWORK_INTERFACE: eth0
NETWORK_LATENCY: 60000
CONTAINER_RUNTIME: containerd
- name: Update pod-network-latency result
if: startsWith(github.event.comment.body, '/run-e2e-pod-network-latency') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Pod Network Latency | Pass | containerd |
- name: Running pod-network-loss chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-pod-network-loss') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: pod-network-loss
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 60
NETWORK_INTERFACE: eth0
NETWORK_PACKET_LOSS_PERCENTAGE: 100
CONTAINER_RUNTIME: containerd
- name: Update pod-network-loss result
if: startsWith(github.event.comment.body, '/run-e2e-pod-network-loss') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Pod Network Loss | Pass | containerd |
- name: Running pod autoscaler chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-pod-autoscaler') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: pod-autoscaler
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
TOTAL_CHAOS_DURATION: 60
- name: Update pod-autoscaler result
if: startsWith(github.event.comment.body, '/run-e2e-pod-autoscaler') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Pod Autoscaler | Pass | containerd |
- name: Running node-io-stress chaos experiment
if: startsWith(github.event.comment.body, '/run-e2e-node-io-stress') || startsWith(github.event.comment.body, '/run-e2e-io-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
EXPERIMENT_NAME: node-io-stress
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: ci
IMAGE_PULL_POLICY: IfNotPresent
JOB_CLEANUP_POLICY: delete
TOTAL_CHAOS_DURATION: 120
FILESYSTEM_UTILIZATION_PERCENTAGE: 10
- name: Update node-io-stress result
if: startsWith(github.event.comment.body, '/run-e2e-node-io-stress') || startsWith(github.event.comment.body, '/run-e2e-io-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
| Node IO Stress | Pass | containerd |
- name: Check the test run
if: |
startsWith(github.event.comment.body, '/run-e2e-pod-delete') || startsWith(github.event.comment.body, '/run-e2e-container-kill') ||
startsWith(github.event.comment.body, '/run-e2e-node-cpu-hog') || startsWith(github.event.comment.body, '/run-e2e-node-memory-hog') ||
startsWith(github.event.comment.body, '/run-e2e-pod-cpu-hog') || startsWith(github.event.comment.body, '/run-e2e-pod-memory-hog') ||
startsWith(github.event.comment.body, '/run-e2e-pod-network-corruption') || startsWith(github.event.comment.body, '/run-e2e-pod-network-loss') ||
startsWith(github.event.comment.body, '/run-e2e-pod-network-latency') || startsWith(github.event.comment.body, '/run-e2e-pod-network-duplication') ||
startsWith(github.event.comment.body, '/run-e2e-pod-autoscaler') || startsWith(github.event.comment.body, '/run-e2e-node-io-stress') ||
startsWith(github.event.comment.body, '/run-e2e-resource-chaos') || startsWith(github.event.comment.body, '/run-e2e-network-chaos') ||
startsWith(github.event.comment.body, '/run-e2e-io-chaos') || startsWith(github.event.comment.body, '/run-e2e-all')
run: |
echo ::set-env name=TEST_RUN::true
env:
ACTIONS_ALLOW_UNSECURE_COMMANDS: true
- name: Check for all the jobs are succeeded
if: ${{ success() && env.TEST_RUN == 'true' }}
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
****
**Test Result:** All tests are passed
**Run ID:** [${{ env.RUN_ID }}](https://github.com/litmuschaos/litmus-go/actions/runs/${{ env.RUN_ID }})
reactions: hooray
env:
RUN_ID: ${{ github.run_id }}
- name: Check for any job failed
if: ${{ failure() }}
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
****
**Test Failed:** Some tests are failed please check
**Run ID:** [${{ env.RUN_ID }}](https://github.com/litmuschaos/litmus-go/actions/runs/${{ env.RUN_ID }})
reactions: confused
env:
RUN_ID: ${{ github.run_id }}
- name: Uninstall Litmus
uses: mayadata-io/github-chaos-actions@v0.3.1
env:
LITMUS_CLEANUP: true
- name: Deleting KinD cluster
if: ${{ always() }}
run: kind delete cluster
- name: Check if any test ran or not
if: env.TEST_RUN != 'true'
uses: peter-evans/create-or-update-comment@v1
with:
comment-id: "${{ github.event.comment.id }}"
body: |
****
**Test Result:** No test found
**Run ID:** [${{ env.RUN_ID }}](https://github.com/litmuschaoslitmus-go/actions/runs/${{ env.RUN_ID }})
reactions: eyes
env:
RUN_ID: ${{ github.run_id }}

View File

@ -0,0 +1,198 @@
name: E2E
on:
pull_request:
branches: [master]
types: [opened, synchronize, reopened]
paths-ignore:
- '**.md'
- '**.yml'
- '**.yaml'
jobs:
Pod_Level_In_Serial_Mode:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v5
with:
go-version: '1.20'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Generating Go binary and Building docker image
run: |
make build-amd64
- name: Install KinD
run: |
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
mv ./kind /usr/local/bin/kind
- name: Create KinD Cluster
run: |
kind create cluster --config build/kind-cluster/kind-config.yaml
- name: Configuring and testing the Installation
run: |
kubectl taint nodes kind-control-plane node-role.kubernetes.io/control-plane-
kubectl cluster-info --context kind-kind
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
- uses: actions/checkout@v2
with:
repository: 'litmuschaos/litmus-e2e'
ref: 'master'
- name: Running Pod level experiment with affected percentage 100 and in series mode
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /home/runner/.kube/config
run: |
make build-litmus
make app-deploy
make pod-affected-perc-ton-series
- name: Deleting KinD cluster
if: always()
run: kind delete cluster
Pod_Level_In_Parallel_Mode:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v5
with:
go-version: '1.20'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Generating Go binary and Building docker image
run: |
make build-amd64
- name: Install KinD
run: |
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
mv ./kind /usr/local/bin/kind
- name: Create KinD Cluster
run: |
kind create cluster --config build/kind-cluster/kind-config.yaml
- name: Configuring and testing the Installation
env:
KUBECONFIG: /home/runner/.kube/config
run: |
kubectl taint nodes kind-control-plane node-role.kubernetes.io/control-plane-
kubectl cluster-info --context kind-kind
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
- uses: actions/checkout@v2
with:
repository: 'litmuschaos/litmus-e2e'
ref: 'master'
- name: Running Pod level experiment with affected percentage 100 and in parallel mode
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /home/runner/.kube/config
run: |
make build-litmus
make app-deploy
make pod-affected-perc-ton-parallel
- name: Deleting KinD cluster
if: always()
run: kind delete cluster
Node_Level_Tests:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v5
with:
go-version: '1.20'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Generating Go binary and Building docker image
run: |
make build-amd64
- name: Install KinD
run: |
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
mv ./kind /usr/local/bin/kind
- name: Create KinD Cluster
run: |
kind create cluster --config build/kind-cluster/kind-config.yaml
- name: Configuring and testing the Installation
run: |
kubectl taint nodes kind-control-plane node-role.kubernetes.io/control-plane-
kubectl cluster-info --context kind-kind
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
- uses: actions/checkout@v2
with:
repository: 'litmuschaos/litmus-e2e'
ref: 'master'
- name: Setup litmus and deploy application
env:
KUBECONFIG: /home/runner/.kube/config
run: |
make build-litmus
make app-deploy
- name: Running Node Drain experiments
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /home/runner/.kube/config
run: make node-drain
- name: Running Node Taint experiments
if: always()
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /home/runner/.kube/config
run: make node-taint
- name: Deleting KinD cluster
if: always()
run: |
kubectl get nodes
kind delete cluster

27
.github/workflows/security-scan.yml vendored Normal file
View File

@ -0,0 +1,27 @@
---
name: Security Scan
on:
workflow_dispatch:
jobs:
trivy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Build an image from Dockerfile
run: |
docker build -f build/Dockerfile -t docker.io/litmuschaos/go-runner:${{ github.sha }} . --build-arg TARGETARCH=amd64 --build-arg LITMUS_VERSION=3.9.0
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: 'docker.io/litmuschaos/go-runner:${{ github.sha }}'
format: 'table'
exit-code: '1'
ignore-unfixed: true
vuln-type: 'os,library'
severity: 'CRITICAL,HIGH'

2
.gitignore vendored
View File

@ -1,2 +1,4 @@
build/_output
.stignore
.idea/
.vscode/

62
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,62 @@
# Contributing to Litmus-Go
Litmus is an Apache 2.0 Licensed project and uses the standard GitHub pull requests process to review and accept contributions.
There are several areas of Litmus that could use your help. For starters, you could help in improving the sections in this document by either creating a new issue describing the improvement or submitting a pull request to this repository.
- If you are a first-time contributor, please see [Steps to Contribute](#steps-to-contribute).
- If you would like to suggest new tests to be added to litmus, please go ahead and [create a new issue](https://github.com/litmuschaos/litmus/issues/new) describing your test. All you need to do is specify the workload type and the operations that you would like to perform on the workload.
- If you would like to work on something more involved, please connect with the Litmus Contributors.
- If you would like to make code contributions, all your commits should be signed with Developer Certificate of Origin. See [Sign your work](#sign-your-work).
## Steps to Contribute
- Find an issue to work on or create a new issue. The issues are maintained at [litmuschaos/litmus](https://github.com/litmuschaos/litmus/issues). You can pick up from a list of [good-first-issues](https://github.com/litmuschaos/litmus/labels/good%20first%20issue).
- Claim your issue by commenting your intent to work on it to avoid duplication of efforts.
- Fork the repository on GitHub.
- Create a branch from where you want to base your work (usually master).
- Make your changes.
- Relevant coding style guidelines are the [Go Code Review Comments](https://code.google.com/p/go-wiki/wiki/CodeReviewComments) and the _Formatting and style_ section of Peter Bourgon's [Go: Best Practices for Production Environments](http://peter.bourgon.org/go-in-production/#formatting-and-style).
- Commit your changes by making sure the commit messages convey the need and notes about the commit.
- Push your changes to the branch in your fork of the repository.
- Submit a pull request to the original repository. See [Pull Request checklist](#pull-request-checklist)
## Pull Request Checklist
- Rebase to the current master branch before submitting your pull request.
- Commits should be as small as possible. Each commit should follow the checklist below:
- For code changes, add tests relevant to the fixed bug or new feature
- Pass the compile and tests - includes spell checks, formatting, etc
- Commit header (first line) should convey what changed
- Commit body should include details such as why the changes are required and how the proposed changes
- DCO Signed
- If your PR is not getting reviewed or you need a specific person to review it, please reach out to the Litmus contributors at the [Litmus slack channel](https://app.slack.com/client/T09NY5SBT/CNXNB0ZTN)
## Sign your work
We use the Developer Certificate of Origin (DCO) as an additional safeguard for the LitmusChaos project. This is a well established and widely used mechanism to assure that contributors have confirmed their right to license their contribution under the project's license. Please add a line to every git commit message:
```sh
Signed-off-by: Random J Developer <random@developer.example.org>
```
Use your real name (sorry, no pseudonyms or anonymous contributions). The email id should match the email id provided in your GitHub profile.
If you set your `user.name` and `user.email` in git config, you can sign your commit automatically with `git commit -s`.
You can also use git [aliases](https://git-scm.com/book/tr/v2/Git-Basics-Git-Aliases) like `git config --global alias.ci 'commit -s'`. Now you can commit with `git ci` and the commit will be signed.
## Setting up your Development Environment
This project is implemented using Go and uses the standard golang tools for development and build. In addition, this project heavily relies on Docker and Kubernetes. It is expected that the contributors.
- are familiar with working with Go
- are familiar with Docker containers
- are familiar with Kubernetes and have access to a Kubernetes cluster or Minikube to test the changes.
For the creation of new chaos-experiment and testing of the modified changes, see the detailed instructions [here](./contribute/developer-guide/README.md).
## Community
The litmus community will have a monthly community sync-up on 3rd Wednesday 22.00-23.00IST / 18.30-19.30CEST
- The community meeting details are available [here](https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q). Please feel free to join the community meeting.

View File

@ -1,30 +0,0 @@
FROM golang:1.13.4 as builder
WORKDIR /tmp/litmus/
# Copying the experiments and chaos libraries
COPY . .
# After copying the files, we need to ensure the files beling to the user,
# otherwise we will not be able to write build files
USER root
RUN chown -R 500 /tmp/litmus
USER 500
WORKDIR /tmp/litmus/experiments/generic/
# We need to ensure a reasonable build cache dir, as user 500 does not exist on certain systems,
# and will not have permission to write to /.cache, as the user does not exist
ENV XDG_CACHE_HOME=/tmp/.cache
# Building the executables and placing them in a separate directory
RUN go build -o /tmp/litmus/build/ -mod vendor ./...
# Using as main image the crictl image with copying only the binaries
FROM litmuschaos/crictl:latest
WORKDIR /tmp/litmus/
COPY --from=builder /tmp/litmus/build .

View File

@ -8,28 +8,30 @@
IS_DOCKER_INSTALLED = $(shell which docker >> /dev/null 2>&1; echo $$?)
# Docker info
DOCKER_REGISTRY ?= docker.io
DOCKER_REPO ?= litmuschaos
DOCKER_IMAGE ?= go-runner
DOCKER_TAG ?= ci
PACKAGES = $(shell go list ./... | grep -v '/vendor/')
.PHONY: all
all: deps gotasks build push trivy-check build-amd64 push-amd64
.PHONY: help
help:
@echo ""
@echo "Usage:-"
@echo "\tmake all -- [default] builds the litmus containers"
@echo "\tmake deps -- sets up dependencies for image build"
@echo "\tmake push -- pushes the litmus-go multi-arch image"
@echo "\tmake build-amd64 -- builds the litmus-go binary & docker amd64 image"
@echo "\tmake push-amd64 -- pushes the litmus-go amd64 image"
@echo ""
.PHONY: all
all: deps gotasks build push trivy-check
.PHONY: deps
deps: _build_check_docker
_build_check_docker:
@echo "------------------"
@echo "--> Check the Docker deps"
@echo "--> Check the Docker deps"
@echo "------------------"
@if [ $(IS_DOCKER_INSTALLED) -eq 1 ]; \
then echo "" \
@ -39,26 +41,7 @@ _build_check_docker:
fi;
.PHONY: gotasks
gotasks: format lint unused-package-check
.PHONY: format
format:
@echo "------------------"
@echo "--> Running go fmt"
@echo "------------------"
@go fmt $(PACKAGES)
.PHONY: lint
lint:
@echo "------------------"
@echo "--> Running golint"
@echo "------------------"
@go get -u golang.org/x/lint/golint
@golint $(PACKAGES)
@echo "------------------"
@echo "--> Running go vet"
@echo "------------------"
@go vet $(PACKAGES)
gotasks: unused-package-check
.PHONY: unused-package-check
unused-package-check:
@ -70,57 +53,48 @@ unused-package-check:
echo "go mod tidy checking failed!"; echo "$${tidy}"; echo; \
fi
.PHONY: build
build: experiment-build image-build
.PHONY: experiment-build
experiment-build:
.PHONY: docker.buildx
docker.buildx:
@echo "------------------------------"
@echo "--> Build experiment go binary"
@echo "--> Setting up Builder "
@echo "------------------------------"
@./build/go-multiarch-build.sh build/generate_go_binary
@if ! docker buildx ls | grep -q multibuilder; then\
docker buildx create --name multibuilder;\
docker buildx inspect multibuilder --bootstrap;\
docker buildx use multibuilder;\
fi
.PHONY: push
push: docker.buildx image-push
image-push:
@echo "------------------------"
@echo "--> Push go-runner image"
@echo "------------------------"
@echo "Pushing $(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)"
@docker buildx build . --push --file build/Dockerfile --progress plain --platform linux/arm64,linux/amd64 --no-cache --tag $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
.PHONY: image-build
image-build:
@echo "-------------------------"
@echo "--> Build go-runner image"
@echo "-------------------------"
@sudo docker buildx build --file build/Dockerfile --progress plane --platform linux/arm64,linux/amd64 --no-cache --tag $(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG) .
.PHONY: build-amd64
build-amd64:
@echo "------------------------------"
@echo "--> Build experiment go binary"
@echo "------------------------------"
@env GOOS=linux GOARCH=amd64 sh build/generate_go_binary
@echo "-------------------------"
@echo "--> Build go-runner image"
@echo "--> Build go-runner image"
@echo "-------------------------"
@sudo docker build --file build/Dockerfile --tag $(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG) . --build-arg TARGETARCH=amd64
@sudo docker build --file build/Dockerfile --tag $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG) . --build-arg TARGETARCH=amd64 --build-arg LITMUS_VERSION=3.9.0
.PHONY: push-amd64
push-amd64:
@echo "------------------------------"
@echo "--> Pushing image"
@echo "--> Pushing image"
@echo "------------------------------"
@sudo docker push $(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
.PHONY: push
push: litmus-go-push
@sudo docker push $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
litmus-go-push:
@echo "-------------------"
@echo "--> go-runner image"
@echo "-------------------"
REPONAME="$(DOCKER_REPO)" IMGNAME="$(DOCKER_IMAGE)" IMGTAG="$(DOCKER_TAG)" ./build/push
.PHONY: trivy-check
trivy-check:
@echo "------------------------"
@echo "---> Running Trivy Check"
@echo "------------------------"
@./trivy --exit-code 0 --severity HIGH --no-progress $(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
@./trivy --exit-code 0 --severity CRITICAL --no-progress $(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
@./trivy --exit-code 0 --severity HIGH --no-progress $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
@./trivy --exit-code 0 --severity CRITICAL --no-progress $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)

View File

@ -1,19 +1,40 @@
# LitmusGo:
- This repo consists of Litmus Chaos Experiments written in golang. The examples in this repo are good indicators
of how to construct the experiments in golang: complete with steady state checks, chaosresult generation, chaos injection etc..,
post chaos checks, create events and reports for observability and configure sinks for these.
[![Slack Channel](https://img.shields.io/badge/Slack-Join-purple)](https://slack.litmuschaos.io)
![GitHub Workflow](https://github.com/litmuschaos/litmus-go/actions/workflows/push.yml/badge.svg?branch=master)
[![Docker Pulls](https://img.shields.io/docker/pulls/litmuschaos/go-runner.svg)](https://hub.docker.com/r/litmuschaos/go-runner)
[![GitHub issues](https://img.shields.io/github/issues/litmuschaos/litmus-go)](https://github.com/litmuschaos/litmus-go/issues)
[![Twitter Follow](https://img.shields.io/twitter/follow/litmuschaos?style=social)](https://twitter.com/LitmusChaos)
[![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/5297/badge)](https://bestpractices.coreinfrastructure.org/projects/5297)
[![Go Report Card](https://goreportcard.com/badge/github.com/litmuschaos/litmus-go)](https://goreportcard.com/report/github.com/litmuschaos/litmus-go)
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Flitmuschaos%2Flitmus-go.svg?type=shield)](https://app.fossa.io/projects/git%2Bgithub.com%2Flitmuschaos%2Flitmus-go?ref=badge_shield)
[![YouTube Channel](https://img.shields.io/badge/YouTube-Subscribe-red)](https://www.youtube.com/channel/UCa57PMqmz_j0wnteRa9nCaw)
<br><br>
## Run E2E on a Pull Request
This repo consists of Litmus Chaos Experiments written in golang. The examples in this repo are good indicators of how to construct the experiments in golang: complete with steady state checks, chaosresult generation, chaos injection etc.., post chaos checks, create events and reports for observability and configure sinks for these.
- We can run a certain number of custom tests on a PR using GitHub chaos actions read about [custom bot](https://github.com/litmuschaos/litmus-go/blob/master/.github/workflows/guide.md) to know more.
**NOTE**: This repo can be viewed as an extension to the [litmuschaos/litmus](https://github.com/litmuschaos/litmus) repo. The litmus repo will also continue to be the project's community-facing meta repo housing other important project artifacts. In that sense, litmus-go is very similar to and therefore a sister repo of [litmus-python](https://github.com/litmuschaos/litmus-python) which houses examples for experiment business logic written in python.
**NOTE**
## Litmus SDK
- This repo can be viewed as an extension to the [litmuschaos/litmus](https://github.com/litmuschaos/litmus) repo
in the sense that the litmus repo also houses a significant set of experiments, built using ansible. The litmus repo
will also continue to be the project's community-facing meta repo housing other important project artifacts. In that
sense, litmus-go is very similar to and therefore a sister repo of [litmus-python](https://github.com/litmuschaos/litmus-python) which
houses examples for experiment business logic written in python.
The Litmus SDK provides a simple way to bootstrap your experiment and helps create the aforementioned artifacts in the appropriate directory (i.e., as per the chaos-category) based on an attributes file provided as input by the chart-developer. The scaffolded files consist of placeholders which can then be filled as desired.
It generates the custom chaos experiments with some default Pre & Post Chaos Checks (AUT & Auxiliary Applications status checks). It can use the existing chaoslib (present inside /chaoslib directory), if available else It will create a new chaoslib inside the corresponding directory.
Refer [Litmus-SDK](https://github.com/litmuschaos/litmus-go/blob/master/contribute/developer-guide/README.md) for more details.
## How to get started?
Refer the [LitmusChaos Docs](https://docs.litmuschaos.io) and [Experiment Docs](https://litmuschaos.github.io/litmus/experiments/categories/contents/)
## How do I contribute?
You can contribute by raising issues, improving the documentation, contributing to the core framework and tooling, etc.
Head over to the [Contribution guide](CONTRIBUTING.md)
## License
Here is a copy of the License: [`License`](LICENSE)
## License Status and Vulnerability Check
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Flitmuschaos%2Flitmus-go.svg?type=large)](https://app.fossa.io/projects/git%2Bgithub.com%2Flitmuschaos%2Flitmus-go?ref=badge_large)

223
bin/experiment/experiment.go Executable file
View File

@ -0,0 +1,223 @@
package main
import (
"context"
"errors"
"flag"
"os"
// Uncomment to load all auth plugins
// _ "k8s.io/client-go/plugin/pkg/client/auth"
// Or uncomment to load specific auth plugins
// _ "k8s.io/client-go/plugin/pkg/client/auth/azure"
// _ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
// _ "k8s.io/client-go/plugin/pkg/client/auth/oidc"
// _ "k8s.io/client-go/plugin/pkg/client/auth/openstack"
"go.opentelemetry.io/otel"
awsSSMChaosByID "github.com/litmuschaos/litmus-go/experiments/aws-ssm/aws-ssm-chaos-by-id/experiment"
awsSSMChaosByTag "github.com/litmuschaos/litmus-go/experiments/aws-ssm/aws-ssm-chaos-by-tag/experiment"
azureDiskLoss "github.com/litmuschaos/litmus-go/experiments/azure/azure-disk-loss/experiment"
azureInstanceStop "github.com/litmuschaos/litmus-go/experiments/azure/instance-stop/experiment"
redfishNodeRestart "github.com/litmuschaos/litmus-go/experiments/baremetal/redfish-node-restart/experiment"
cassandraPodDelete "github.com/litmuschaos/litmus-go/experiments/cassandra/pod-delete/experiment"
gcpVMDiskLossByLabel "github.com/litmuschaos/litmus-go/experiments/gcp/gcp-vm-disk-loss-by-label/experiment"
gcpVMDiskLoss "github.com/litmuschaos/litmus-go/experiments/gcp/gcp-vm-disk-loss/experiment"
gcpVMInstanceStopByLabel "github.com/litmuschaos/litmus-go/experiments/gcp/gcp-vm-instance-stop-by-label/experiment"
gcpVMInstanceStop "github.com/litmuschaos/litmus-go/experiments/gcp/gcp-vm-instance-stop/experiment"
containerKill "github.com/litmuschaos/litmus-go/experiments/generic/container-kill/experiment"
diskFill "github.com/litmuschaos/litmus-go/experiments/generic/disk-fill/experiment"
dockerServiceKill "github.com/litmuschaos/litmus-go/experiments/generic/docker-service-kill/experiment"
kubeletServiceKill "github.com/litmuschaos/litmus-go/experiments/generic/kubelet-service-kill/experiment"
nodeCPUHog "github.com/litmuschaos/litmus-go/experiments/generic/node-cpu-hog/experiment"
nodeDrain "github.com/litmuschaos/litmus-go/experiments/generic/node-drain/experiment"
nodeIOStress "github.com/litmuschaos/litmus-go/experiments/generic/node-io-stress/experiment"
nodeMemoryHog "github.com/litmuschaos/litmus-go/experiments/generic/node-memory-hog/experiment"
nodeRestart "github.com/litmuschaos/litmus-go/experiments/generic/node-restart/experiment"
nodeTaint "github.com/litmuschaos/litmus-go/experiments/generic/node-taint/experiment"
podAutoscaler "github.com/litmuschaos/litmus-go/experiments/generic/pod-autoscaler/experiment"
podCPUHogExec "github.com/litmuschaos/litmus-go/experiments/generic/pod-cpu-hog-exec/experiment"
podCPUHog "github.com/litmuschaos/litmus-go/experiments/generic/pod-cpu-hog/experiment"
podDelete "github.com/litmuschaos/litmus-go/experiments/generic/pod-delete/experiment"
podDNSError "github.com/litmuschaos/litmus-go/experiments/generic/pod-dns-error/experiment"
podDNSSpoof "github.com/litmuschaos/litmus-go/experiments/generic/pod-dns-spoof/experiment"
podFioStress "github.com/litmuschaos/litmus-go/experiments/generic/pod-fio-stress/experiment"
podHttpLatency "github.com/litmuschaos/litmus-go/experiments/generic/pod-http-latency/experiment"
podHttpModifyBody "github.com/litmuschaos/litmus-go/experiments/generic/pod-http-modify-body/experiment"
podHttpModifyHeader "github.com/litmuschaos/litmus-go/experiments/generic/pod-http-modify-header/experiment"
podHttpResetPeer "github.com/litmuschaos/litmus-go/experiments/generic/pod-http-reset-peer/experiment"
podHttpStatusCode "github.com/litmuschaos/litmus-go/experiments/generic/pod-http-status-code/experiment"
podIOStress "github.com/litmuschaos/litmus-go/experiments/generic/pod-io-stress/experiment"
podMemoryHogExec "github.com/litmuschaos/litmus-go/experiments/generic/pod-memory-hog-exec/experiment"
podMemoryHog "github.com/litmuschaos/litmus-go/experiments/generic/pod-memory-hog/experiment"
podNetworkCorruption "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-corruption/experiment"
podNetworkDuplication "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-duplication/experiment"
podNetworkLatency "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-latency/experiment"
podNetworkLoss "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-loss/experiment"
podNetworkPartition "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-partition/experiment"
podNetworkRateLimit "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-rate-limit/experiment"
kafkaBrokerPodFailure "github.com/litmuschaos/litmus-go/experiments/kafka/kafka-broker-pod-failure/experiment"
ebsLossByID "github.com/litmuschaos/litmus-go/experiments/kube-aws/ebs-loss-by-id/experiment"
ebsLossByTag "github.com/litmuschaos/litmus-go/experiments/kube-aws/ebs-loss-by-tag/experiment"
ec2TerminateByID "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate-by-id/experiment"
ec2TerminateByTag "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate-by-tag/experiment"
rdsInstanceStop "github.com/litmuschaos/litmus-go/experiments/kube-aws/rds-instance-stop/experiment"
k6Loadgen "github.com/litmuschaos/litmus-go/experiments/load/k6-loadgen/experiment"
springBootFaults "github.com/litmuschaos/litmus-go/experiments/spring-boot/spring-boot-faults/experiment"
vmpoweroff "github.com/litmuschaos/litmus-go/experiments/vmware/vm-poweroff/experiment"
cli "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/sirupsen/logrus"
)
func init() {
// Log as JSON instead of the default ASCII formatter.
logrus.SetFormatter(&logrus.TextFormatter{
FullTimestamp: true,
DisableSorting: true,
DisableLevelTruncation: true,
})
}
func main() {
initCtx := context.Background()
// Set up Observability.
if otelExporterEndpoint := os.Getenv(telemetry.OTELExporterOTLPEndpoint); otelExporterEndpoint != "" {
shutdown, err := telemetry.InitOTelSDK(initCtx, true, otelExporterEndpoint)
if err != nil {
log.Errorf("Failed to initialize OTel SDK: %v", err)
return
}
defer func() {
err = errors.Join(err, shutdown(initCtx))
}()
initCtx = telemetry.GetTraceParentContext()
}
clients := cli.ClientSets{}
ctx, span := otel.Tracer(telemetry.TracerName).Start(initCtx, "ExecuteExperiment")
defer span.End()
// parse the experiment name
experimentName := flag.String("name", "pod-delete", "name of the chaos experiment")
//Getting kubeConfig and Generate ClientSets
if err := clients.GenerateClientSetFromKubeConfig(); err != nil {
log.Errorf("Unable to Get the kubeconfig, err: %v", err)
return
}
log.Infof("Experiment Name: %v", *experimentName)
// invoke the corresponding experiment based on the (-name) flag
switch *experimentName {
case "container-kill":
containerKill.ContainerKill(ctx, clients)
case "disk-fill":
diskFill.DiskFill(ctx, clients)
case "kafka-broker-pod-failure":
kafkaBrokerPodFailure.KafkaBrokerPodFailure(ctx, clients)
case "kubelet-service-kill":
kubeletServiceKill.KubeletServiceKill(ctx, clients)
case "docker-service-kill":
dockerServiceKill.DockerServiceKill(ctx, clients)
case "node-cpu-hog":
nodeCPUHog.NodeCPUHog(ctx, clients)
case "node-drain":
nodeDrain.NodeDrain(ctx, clients)
case "node-io-stress":
nodeIOStress.NodeIOStress(ctx, clients)
case "node-memory-hog":
nodeMemoryHog.NodeMemoryHog(ctx, clients)
case "node-taint":
nodeTaint.NodeTaint(ctx, clients)
case "pod-autoscaler":
podAutoscaler.PodAutoscaler(ctx, clients)
case "pod-cpu-hog-exec":
podCPUHogExec.PodCPUHogExec(ctx, clients)
case "pod-delete":
podDelete.PodDelete(ctx, clients)
case "pod-io-stress":
podIOStress.PodIOStress(ctx, clients)
case "pod-memory-hog-exec":
podMemoryHogExec.PodMemoryHogExec(ctx, clients)
case "pod-network-corruption":
podNetworkCorruption.PodNetworkCorruption(ctx, clients)
case "pod-network-duplication":
podNetworkDuplication.PodNetworkDuplication(ctx, clients)
case "pod-network-latency":
podNetworkLatency.PodNetworkLatency(ctx, clients)
case "pod-network-loss":
podNetworkLoss.PodNetworkLoss(ctx, clients)
case "pod-network-partition":
podNetworkPartition.PodNetworkPartition(ctx, clients)
case "pod-network-rate-limit":
podNetworkRateLimit.PodNetworkRateLimit(ctx, clients)
case "pod-memory-hog":
podMemoryHog.PodMemoryHog(ctx, clients)
case "pod-cpu-hog":
podCPUHog.PodCPUHog(ctx, clients)
case "cassandra-pod-delete":
cassandraPodDelete.CasssandraPodDelete(ctx, clients)
case "aws-ssm-chaos-by-id":
awsSSMChaosByID.AWSSSMChaosByID(ctx, clients)
case "aws-ssm-chaos-by-tag":
awsSSMChaosByTag.AWSSSMChaosByTag(ctx, clients)
case "ec2-terminate-by-id":
ec2TerminateByID.EC2TerminateByID(ctx, clients)
case "ec2-terminate-by-tag":
ec2TerminateByTag.EC2TerminateByTag(ctx, clients)
case "ebs-loss-by-id":
ebsLossByID.EBSLossByID(ctx, clients)
case "ebs-loss-by-tag":
ebsLossByTag.EBSLossByTag(ctx, clients)
case "rds-instance-stop":
rdsInstanceStop.RDSInstanceStop(ctx, clients)
case "node-restart":
nodeRestart.NodeRestart(ctx, clients)
case "pod-dns-error":
podDNSError.PodDNSError(ctx, clients)
case "pod-dns-spoof":
podDNSSpoof.PodDNSSpoof(ctx, clients)
case "pod-http-latency":
podHttpLatency.PodHttpLatency(ctx, clients)
case "pod-http-status-code":
podHttpStatusCode.PodHttpStatusCode(ctx, clients)
case "pod-http-modify-header":
podHttpModifyHeader.PodHttpModifyHeader(ctx, clients)
case "pod-http-modify-body":
podHttpModifyBody.PodHttpModifyBody(ctx, clients)
case "pod-http-reset-peer":
podHttpResetPeer.PodHttpResetPeer(ctx, clients)
case "vm-poweroff":
vmpoweroff.VMPoweroff(ctx, clients)
case "azure-instance-stop":
azureInstanceStop.AzureInstanceStop(ctx, clients)
case "azure-disk-loss":
azureDiskLoss.AzureDiskLoss(ctx, clients)
case "gcp-vm-disk-loss":
gcpVMDiskLoss.VMDiskLoss(ctx, clients)
case "pod-fio-stress":
podFioStress.PodFioStress(ctx, clients)
case "gcp-vm-instance-stop":
gcpVMInstanceStop.VMInstanceStop(ctx, clients)
case "redfish-node-restart":
redfishNodeRestart.NodeRestart(ctx, clients)
case "gcp-vm-instance-stop-by-label":
gcpVMInstanceStopByLabel.GCPVMInstanceStopByLabel(ctx, clients)
case "gcp-vm-disk-loss-by-label":
gcpVMDiskLossByLabel.GCPVMDiskLossByLabel(ctx, clients)
case "spring-boot-cpu-stress", "spring-boot-memory-stress", "spring-boot-exceptions", "spring-boot-app-kill", "spring-boot-faults", "spring-boot-latency":
springBootFaults.Experiment(ctx, clients, *experimentName)
case "k6-loadgen":
k6Loadgen.Experiment(ctx, clients)
default:
log.Errorf("Unsupported -name %v, please provide the correct value of -name args", *experimentName)
return
}
}

View File

@ -1,115 +0,0 @@
package main
import (
"flag"
// Uncomment to load all auth plugins
// _ "k8s.io/client-go/plugin/pkg/client/auth"
// Or uncomment to load specific auth plugins
// _ "k8s.io/client-go/plugin/pkg/client/auth/azure"
// _ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
// _ "k8s.io/client-go/plugin/pkg/client/auth/oidc"
// _ "k8s.io/client-go/plugin/pkg/client/auth/openstack"
cassandraPodDelete "github.com/litmuschaos/litmus-go/experiments/cassandra/pod-delete/experiment"
containerKill "github.com/litmuschaos/litmus-go/experiments/generic/container-kill/experiment"
diskFill "github.com/litmuschaos/litmus-go/experiments/generic/disk-fill/experiment"
kubeletServiceKill "github.com/litmuschaos/litmus-go/experiments/generic/kubelet-service-kill/experiment"
nodeCPUHog "github.com/litmuschaos/litmus-go/experiments/generic/node-cpu-hog/experiment"
nodeDrain "github.com/litmuschaos/litmus-go/experiments/generic/node-drain/experiment"
nodeIOStress "github.com/litmuschaos/litmus-go/experiments/generic/node-io-stress/experiment"
nodeMemoryHog "github.com/litmuschaos/litmus-go/experiments/generic/node-memory-hog/experiment"
nodeRestart "github.com/litmuschaos/litmus-go/experiments/generic/node-restart/experiment"
nodeTaint "github.com/litmuschaos/litmus-go/experiments/generic/node-taint/experiment"
podAutoscaler "github.com/litmuschaos/litmus-go/experiments/generic/pod-autoscaler/experiment"
podCPUHog "github.com/litmuschaos/litmus-go/experiments/generic/pod-cpu-hog/experiment"
podDelete "github.com/litmuschaos/litmus-go/experiments/generic/pod-delete/experiment"
podIOStress "github.com/litmuschaos/litmus-go/experiments/generic/pod-io-stress/experiment"
podMemoryHog "github.com/litmuschaos/litmus-go/experiments/generic/pod-memory-hog/experiment"
podNetworkCorruption "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-corruption/experiment"
podNetworkDuplication "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-duplication/experiment"
podNetworkLatency "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-latency/experiment"
podNetworkLoss "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-loss/experiment"
kafkaBrokerPodFailure "github.com/litmuschaos/litmus-go/experiments/kafka/kafka-broker-pod-failure/experiment"
ebsLoss "github.com/litmuschaos/litmus-go/experiments/kube-aws/ebs-loss/experiment"
ec2terminate "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate/experiment"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/sirupsen/logrus"
)
func init() {
// Log as JSON instead of the default ASCII formatter.
logrus.SetFormatter(&logrus.TextFormatter{
FullTimestamp: true,
DisableSorting: true,
DisableLevelTruncation: true,
})
}
func main() {
clients := clients.ClientSets{}
// parse the experiment name
experimentName := flag.String("name", "pod-delete", "name of the chaos experiment")
//Getting kubeConfig and Generate ClientSets
if err := clients.GenerateClientSetFromKubeConfig(); err != nil {
log.Fatalf("Unable to Get the kubeconfig, err: %v", err)
}
log.Infof("Experiment Name: %v", *experimentName)
// invoke the corresponding experiment based on the the (-name) flag
switch *experimentName {
case "container-kill":
containerKill.ContainerKill(clients)
case "disk-fill":
diskFill.DiskFill(clients)
case "kafka-broker-pod-failure":
kafkaBrokerPodFailure.KafkaBrokerPodFailure(clients)
case "kubelet-service-kill":
kubeletServiceKill.KubeletServiceKill(clients)
case "node-cpu-hog":
nodeCPUHog.NodeCPUHog(clients)
case "node-drain":
nodeDrain.NodeDrain(clients)
case "node-io-stress":
nodeIOStress.NodeIOStress(clients)
case "node-memory-hog":
nodeMemoryHog.NodeMemoryHog(clients)
case "node-taint":
nodeTaint.NodeTaint(clients)
case "pod-autoscaler":
podAutoscaler.PodAutoscaler(clients)
case "pod-cpu-hog":
podCPUHog.PodCPUHog(clients)
case "pod-delete":
podDelete.PodDelete(clients)
case "pod-io-stress":
podIOStress.PodIOStress(clients)
case "pod-memory-hog":
podMemoryHog.PodMemoryHog(clients)
case "pod-network-corruption":
podNetworkCorruption.PodNetworkCorruption(clients)
case "pod-network-duplication":
podNetworkDuplication.PodNetworkDuplication(clients)
case "pod-network-latency":
podNetworkLatency.PodNetworkLatency(clients)
case "pod-network-loss":
podNetworkLoss.PodNetworkLoss(clients)
case "cassandra-pod-delete":
cassandraPodDelete.CasssandraPodDelete(clients)
case "ec2-terminate":
ec2terminate.EC2Terminate(clients)
case "ebs-loss":
ebsLoss.EBSLoss(clients)
case "node-restart":
nodeRestart.NodeRestart(clients)
default:
log.Fatalf("Unsupported -name %v, please provide the correct value of -name args", *experimentName)
}
}

90
bin/helper/helper.go Normal file
View File

@ -0,0 +1,90 @@
package main
import (
"context"
"errors"
"flag"
"os"
// Uncomment to load all auth plugins
// _ "k8s.io/client-go/plugin/pkg/client/auth"
// Or uncomment to load specific auth plugins
// _ "k8s.io/client-go/plugin/pkg/client/auth/azure"
// _ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
// _ "k8s.io/client-go/plugin/pkg/client/auth/oidc"
// _ "k8s.io/client-go/plugin/pkg/client/auth/openstack"
containerKill "github.com/litmuschaos/litmus-go/chaoslib/litmus/container-kill/helper"
diskFill "github.com/litmuschaos/litmus-go/chaoslib/litmus/disk-fill/helper"
httpChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/helper"
networkChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/helper"
dnsChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/pod-dns-chaos/helper"
stressChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/stress-chaos/helper"
cli "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
func init() {
// Log as JSON instead of the default ASCII formatter
logrus.SetFormatter(&logrus.TextFormatter{
FullTimestamp: true,
DisableSorting: true,
DisableLevelTruncation: true,
})
}
func main() {
ctx := context.Background()
// Set up Observability.
if otelExporterEndpoint := os.Getenv(telemetry.OTELExporterOTLPEndpoint); otelExporterEndpoint != "" {
shutdown, err := telemetry.InitOTelSDK(ctx, true, otelExporterEndpoint)
if err != nil {
log.Errorf("Failed to initialize OTel SDK: %v", err)
return
}
defer func() {
err = errors.Join(err, shutdown(ctx))
}()
ctx = telemetry.GetTraceParentContext()
}
clients := cli.ClientSets{}
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "ExecuteExperimentHelper")
defer span.End()
// parse the helper name
helperName := flag.String("name", "", "name of the helper pod")
//Getting kubeConfig and Generate ClientSets
if err := clients.GenerateClientSetFromKubeConfig(); err != nil {
log.Errorf("Unable to Get the kubeconfig, err: %v", err)
return
}
log.Infof("Helper Name: %v", *helperName)
// invoke the corresponding helper based on the the (-name) flag
switch *helperName {
case "container-kill":
containerKill.Helper(ctx, clients)
case "disk-fill":
diskFill.Helper(ctx, clients)
case "dns-chaos":
dnsChaos.Helper(ctx, clients)
case "stress-chaos":
stressChaos.Helper(ctx, clients)
case "network-chaos":
networkChaos.Helper(ctx, clients)
case "http-chaos":
httpChaos.Helper(ctx, clients)
default:
log.Errorf("Unsupported -name %v, please provide the correct value of -name args", *helperName)
return
}
}

View File

@ -1,55 +1,112 @@
FROM alpine:3.13
# Multi-stage docker build
# Build stage
FROM golang:1.22 AS builder
ARG TARGETOS=linux
ARG TARGETARCH
ADD . /litmus-go
WORKDIR /litmus-go
RUN export GOOS=${TARGETOS} && \
export GOARCH=${TARGETARCH}
RUN CGO_ENABLED=0 go build -o /output/experiments ./bin/experiment
RUN CGO_ENABLED=0 go build -o /output/helpers ./bin/helper
# Packaging stage
FROM registry.access.redhat.com/ubi9/ubi:9.4
LABEL maintainer="LitmusChaos"
ARG TARGETARCH
ARG USER=litmus
RUN rm -rf /var/lib/apt/lists/*
ARG LITMUS_VERSION
# Install generally useful things
RUN apk --update add \
sudo \
htop\
bash\
make\
git \
curl\
iproute2\
stress-ng\
openssh-client\
libc6-compat \
sshpass
RUN yum install -y \
sudo \
sshpass \
procps \
openssh-clients
# Change default shell from ash to bash
RUN sed -i -e "s/bin\/ash/bin\/bash/" /etc/passwd
# tc binary
RUN yum install -y https://dl.rockylinux.org/vault/rocky/9.3/devel/$(uname -m)/os/Packages/i/iproute-6.2.0-5.el9.$(uname -m).rpm
RUN yum install -y https://dl.rockylinux.org/vault/rocky/9.3/devel/$(uname -m)/os/Packages/i/iproute-tc-6.2.0-5.el9.$(uname -m).rpm
# iptables
RUN yum install -y https://dl.rockylinux.org/vault/rocky/9.3/devel/$(uname -m)/os/Packages/i/iptables-libs-1.8.8-6.el9_1.$(uname -m).rpm
RUN yum install -y https://dl.fedoraproject.org/pub/archive/epel/9.3/Everything/$(uname -m)/Packages/i/iptables-legacy-libs-1.8.8-6.el9.2.$(uname -m).rpm
RUN yum install -y https://dl.fedoraproject.org/pub/archive/epel/9.3/Everything/$(uname -m)/Packages/i/iptables-legacy-1.8.8-6.el9.2.$(uname -m).rpm
# stress-ng
RUN yum install -y https://yum.oracle.com/repo/OracleLinux/OL9/appstream/$(uname -m)/getPackage/Judy-1.0.5-28.el9.$(uname -m).rpm
RUN yum install -y https://yum.oracle.com/repo/OracleLinux/OL9/appstream/$(uname -m)/getPackage/stress-ng-0.14.00-2.el9.$(uname -m).rpm
#Installing Kubectl
ENV KUBE_LATEST_VERSION="v1.18.0"
RUN curl -L https://storage.googleapis.com/kubernetes-release/release/${KUBE_LATEST_VERSION}/bin/linux/${TARGETARCH}/kubectl -o /usr/local/bin/kubectl && \
chmod +x /usr/local/bin/kubectl
ENV KUBE_LATEST_VERSION="v1.31.0"
RUN curl -L https://storage.googleapis.com/kubernetes-release/release/${KUBE_LATEST_VERSION}/bin/linux/${TARGETARCH}/kubectl -o /usr/bin/kubectl && \
chmod 755 /usr/bin/kubectl
#Installing crictl binaries
RUN curl -L https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.16.0/crictl-v1.16.0-linux-${TARGETARCH}.tar.gz --output crictl-v1.16.0-linux-${TARGETARCH}.tar.gz && \
tar zxvf crictl-v1.16.0-linux-${TARGETARCH}.tar.gz -C /usr/local/bin
#Installing pumba binaries
ENV PUMBA_VERSION="0.7.7"
RUN curl -L https://github.com/alexei-led/pumba/releases/download/${PUMBA_VERSION}/pumba_linux_${TARGETARCH} --output /usr/local/bin/pumba && chmod +x /usr/local/bin/pumba
RUN curl -L https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.31.1/crictl-v1.31.1-linux-${TARGETARCH}.tar.gz --output crictl-v1.31.1-linux-${TARGETARCH}.tar.gz && \
tar zxvf crictl-v1.31.1-linux-${TARGETARCH}.tar.gz -C /sbin && \
chmod 755 /sbin/crictl
#Installing promql cli binaries
RUN curl -L https://github.com/litmuschaos/test-tools/raw/master/custom/promql-cli/promql-linux-${TARGETARCH} --output /usr/local/bin/promql && chmod +x /usr/local/bin/promql
RUN curl -L https://github.com/chaosnative/promql-cli/releases/download/3.0.0-beta6/promql_linux_${TARGETARCH} --output /usr/bin/promql && chmod 755 /usr/bin/promql
COPY --from=docker:19.03 /usr/local/bin/docker /usr/local/bin/
#Installing pause cli binaries
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/pause-linux-${TARGETARCH} --output /usr/bin/pause && chmod 755 /usr/bin/pause
#Copying Necessary Files
COPY ./build/_output/${TARGETARCH} ./litmus
#Installing dns_interceptor cli binaries
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/dns_interceptor --output /sbin/dns_interceptor && chmod 755 /sbin/dns_interceptor
#add new user
RUN adduser -D -S $USER \
&& echo "$USER ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/$USER \
&& chmod 0440 /etc/sudoers.d/$USER
#Installing nsutil cli binaries
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/nsutil-linux-${TARGETARCH} --output /sbin/nsutil && chmod 755 /sbin/nsutil
USER $USER
#Installing nsutil shared lib
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/nsutil_${TARGETARCH}.so --output /usr/local/lib/nsutil.so && chmod 755 /usr/local/lib/nsutil.so
WORKDIR /litmus
# Installing toxiproxy binaries
RUN curl -L https://litmus-http-proxy.s3.amazonaws.com/cli/cli/toxiproxy-cli-linux-${TARGETARCH}.tar.gz --output toxiproxy-cli-linux-${TARGETARCH}.tar.gz && \
tar zxvf toxiproxy-cli-linux-${TARGETARCH}.tar.gz -C /sbin/ && \
chmod 755 /sbin/toxiproxy-cli
RUN curl -L https://litmus-http-proxy.s3.amazonaws.com/server/server/toxiproxy-server-linux-${TARGETARCH}.tar.gz --output toxiproxy-server-linux-${TARGETARCH}.tar.gz && \
tar zxvf toxiproxy-server-linux-${TARGETARCH}.tar.gz -C /sbin/ && \
chmod 755 /sbin/toxiproxy-server
ENV APP_USER=litmus
ENV APP_DIR="/$APP_USER"
ENV DATA_DIR="$APP_DIR/data"
# The USERD_ID of user
ENV APP_USER_ID=2000
RUN useradd -s /bin/true -u $APP_USER_ID -m -d $APP_DIR $APP_USER
# change to 0(root) group because openshift will run container with arbitrary uid as a member of root group
RUN chgrp -R 0 "$APP_DIR" && chmod -R g=u "$APP_DIR"
# Giving sudo to all users (required for almost all experiments)
RUN echo 'ALL ALL=(ALL:ALL) NOPASSWD: ALL' >> /etc/sudoers
WORKDIR $APP_DIR
COPY --from=builder /output/ .
COPY --from=docker:27.0.3 /usr/local/bin/docker /sbin/docker
RUN chmod 755 /sbin/docker
# Set permissions and ownership for the copied binaries
RUN chmod 755 ./experiments ./helpers && \
chown ${APP_USER}:0 ./experiments ./helpers
# Set ownership for binaries in /sbin and /usr/bin
RUN chown ${APP_USER}:0 /sbin/* /usr/bin/* && \
chown root:root /usr/bin/sudo && \
chmod 4755 /usr/bin/sudo
# Copying Necessary Files
COPY ./pkg/cloud/aws/common/ssm-docs/LitmusChaos-AWS-SSM-Docs.yml ./LitmusChaos-AWS-SSM-Docs.yml
RUN chown ${APP_USER}:0 ./LitmusChaos-AWS-SSM-Docs.yml && chmod 755 ./LitmusChaos-AWS-SSM-Docs.yml
USER ${APP_USER}

View File

@ -1,8 +0,0 @@
# Buiding go binaries for container_kill helper
go build -o build/_output/${GOARCH}/helper/container-killer ./chaoslib/litmus/container-kill/helper
# Buiding go binaries for network_chaos helper
go build -o build/_output/${GOARCH}/helper/network-chaos ./chaoslib/litmus/network-chaos/helper
# Buiding go binaries for disk_fill helper
go build -o build/_output/${GOARCH}/helper/disk-fill ./chaoslib/litmus/disk-fill/helper
# Building go binaries for all experiments
go build -o build/_output/${GOARCH}/experiments ./bin

View File

@ -1,24 +0,0 @@
#!/usr/bin/env bash
package=$1
if [[ -z "$package" ]]; then
echo "usage: $0 <package-name>"
exit 1
fi
package_split=(${package//\// })
package_name=${package_split[-1]}
# Add the architecture for building image
platforms=("linux/amd64" "linux/arm64")
for platform in "${platforms[@]}"
do
platform_split=(${platform//\// })
GOOS=${platform_split[0]}
GOARCH=${platform_split[1]}
env GOOS=$GOOS GOARCH=$GOARCH sh $package
if [ $? -ne 0 ]; then
echo 'An error has occurred! Aborting the script execution...'
exit 1
fi
done

View File

@ -0,0 +1,6 @@
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker

View File

@ -1,35 +0,0 @@
#!/bin/bash
set -e
if [ -z "${REPONAME}" ]
then
REPONAME="litmuschaos"
fi
if [ -z "${IMGNAME}" ] || [ -z "${IMGTAG}" ];
then
echo "Image details are missing. Nothing to push.";
exit 1
fi
IMAGEID=$( sudo docker images -q ${REPONAME}/${IMGNAME}:${IMGTAG} )
if [ ! -z "${DNAME}" ] && [ ! -z "${DPASS}" ];
then
sudo docker login -u "${DNAME}" -p "${DPASS}";
# Push image to docker hub
echo "Pushing ${REPONAME}/${IMGNAME}:${IMGTAG} ...";
sudo docker buildx build --file build/Dockerfile --push --progress plane --platform linux/arm64,linux/amd64 --no-cache --tag ${REPONAME}/${IMGNAME}:${IMGTAG} .
if [ ! -z "${RELEASE_TAG}" ] ;
then
# Push with different tags if tagged as a release
# When github is tagged with a release, then Travis will
# set the release tag in env RELEASE_TAG
echo "Pushing ${REPONAME}/${IMGNAME}:${RELEASE_TAG} ...";
sudo docker buildx build --file build/Dockerfile --push --progress plane --platform linux/arm64,linux/amd64 --no-cache --tag ${REPONAME}/${IMGNAME}:${RELEASE_TAG} .
echo "Pushing ${REPONAME}/${IMGNAME}:latest ...";
sudo docker buildx build --file build/Dockerfile --push --progress plane --platform linux/arm64,linux/amd64 --no-cache --tag ${REPONAME}/${IMGNAME}:latest .
fi;
else
echo "No docker credentials provided. Skip uploading ${REPONAME}/${IMGNAME}:${IMGTAG} to docker hub";
fi;

View File

@ -0,0 +1,180 @@
package lib
import (
"context"
"os"
"strings"
"time"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
// InjectChaosInSerialMode will inject the aws ssm chaos in serial mode that is one after other
func InjectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, inject chan os.Signal) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSSSMFaultInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceID list, %v", instanceIDList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on ec2 instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Running SSM command on the instance
for i, ec2ID := range instanceIDList {
//Sending AWS SSM command
log.Info("[Chaos]: Starting the ssm command")
ec2IDList := strings.Fields(ec2ID)
commandId, err := ssm.SendSSMCommand(experimentsDetails, ec2IDList)
if err != nil {
return stacktrace.Propagate(err, "failed to send ssm command")
}
//prepare commands for abort recovery
experimentsDetails.CommandIDs = append(experimentsDetails.CommandIDs, commandId)
//wait for the ssm command to get in running state
log.Info("[Wait]: Waiting for the ssm command to get in InProgress state")
if err := ssm.WaitForCommandStatus("InProgress", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return stacktrace.Propagate(err, "failed to start ssm command")
}
common.SetTargets(ec2ID, "injected", "EC2", chaosDetails)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//wait for the ssm command to get succeeded in the given chaos duration
log.Info("[Wait]: Waiting for the ssm command to get completed")
if err := ssm.WaitForCommandStatus("Success", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return stacktrace.Propagate(err, "failed to send ssm command")
}
common.SetTargets(ec2ID, "reverted", "EC2", chaosDetails)
//Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// InjectChaosInParallelMode will inject the aws ssm chaos in parallel mode that is all at once
func InjectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, inject chan os.Signal) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSSSMFaultInParallelMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceID list, %v", instanceIDList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on ec2 instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Sending AWS SSM command
log.Info("[Chaos]: Starting the ssm command")
commandId, err := ssm.SendSSMCommand(experimentsDetails, instanceIDList)
if err != nil {
return stacktrace.Propagate(err, "failed to send ssm command")
}
//prepare commands for abort recovery
experimentsDetails.CommandIDs = append(experimentsDetails.CommandIDs, commandId)
for _, ec2ID := range instanceIDList {
//wait for the ssm command to get in running state
log.Info("[Wait]: Waiting for the ssm command to get in InProgress state")
if err := ssm.WaitForCommandStatus("InProgress", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return stacktrace.Propagate(err, "failed to start ssm command")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
for _, ec2ID := range instanceIDList {
//wait for the ssm command to get succeeded in the given chaos duration
log.Info("[Wait]: Waiting for the ssm command to get completed")
if err := ssm.WaitForCommandStatus("Success", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return stacktrace.Propagate(err, "failed to send ssm command")
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// AbortWatcher will be watching for the abort signal and revert the chaos
func AbortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, abort chan os.Signal) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
switch {
case len(experimentsDetails.CommandIDs) != 0:
for _, commandId := range experimentsDetails.CommandIDs {
if err := ssm.CancelCommand(commandId, experimentsDetails.Region); err != nil {
log.Errorf("[Abort]: Failed to cancel command, recovery failed: %v", err)
}
}
default:
log.Info("[Abort]: No SSM Command found to cancel")
}
if err := ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region); err != nil {
log.Errorf("Failed to delete ssm document: %v", err)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -0,0 +1,91 @@
package ssm
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"github.com/litmuschaos/litmus-go/chaoslib/litmus/aws-ssm-chaos/lib"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareAWSSSMChaosByID contains the prepration and injection steps for the experiment
func PrepareAWSSSMChaosByID(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSSSMFaultByID")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//create and upload the ssm document on the given aws service monitoring docs
if err = ssm.CreateAndUploadDocument(experimentsDetails.DocumentName, experimentsDetails.DocumentType, experimentsDetails.DocumentFormat, experimentsDetails.DocumentPath, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "could not create and upload the ssm document")
}
experimentsDetails.IsDocsUploaded = true
log.Info("[Info]: SSM docs uploaded successfully")
// watching for the abort signal and revert the chaos
go lib.AbortWatcher(experimentsDetails, abort)
//get the instance id or list of instance ids
instanceIDList := strings.Split(experimentsDetails.EC2InstanceID, ",")
if experimentsDetails.EC2InstanceID == "" || len(instanceIDList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no instance id found for chaos injection"}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = lib.InjectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = lib.InjectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Delete the ssm document on the given aws service monitoring docs
err = ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region)
if err != nil {
return stacktrace.Propagate(err, "failed to delete ssm doc")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}

View File

@ -0,0 +1,86 @@
package ssm
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"github.com/litmuschaos/litmus-go/chaoslib/litmus/aws-ssm-chaos/lib"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
// PrepareAWSSSMChaosByTag contains the prepration and injection steps for the experiment
func PrepareAWSSSMChaosByTag(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSSSMFaultByTag")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//create and upload the ssm document on the given aws service monitoring docs
if err = ssm.CreateAndUploadDocument(experimentsDetails.DocumentName, experimentsDetails.DocumentType, experimentsDetails.DocumentFormat, experimentsDetails.DocumentPath, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "could not create and upload the ssm document")
}
experimentsDetails.IsDocsUploaded = true
log.Info("[Info]: SSM docs uploaded successfully")
// watching for the abort signal and revert the chaos
go lib.AbortWatcher(experimentsDetails, abort)
instanceIDList := common.FilterBasedOnPercentage(experimentsDetails.InstanceAffectedPerc, experimentsDetails.TargetInstanceIDList)
log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceIDList))
if len(instanceIDList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no instance id found for chaos injection"}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = lib.InjectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = lib.InjectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Delete the ssm document on the given aws service monitoring docs
err = ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region)
if err != nil {
return stacktrace.Propagate(err, "failed to delete ssm doc")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}

View File

@ -0,0 +1,299 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/Azure/azure-sdk-for-go/profiles/latest/compute/mgmt/compute"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/azure/disk-loss/types"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
diskStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/disk"
instanceStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/instance"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareChaos contains the prepration and injection steps for the experiment
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAzureDiskLossFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//get the disk name or list of disk names
diskNameList := strings.Split(experimentsDetails.VirtualDiskNames, ",")
if experimentsDetails.VirtualDiskNames == "" || len(diskNameList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no volume names found to detach"}
}
instanceNamesWithDiskNames, err := diskStatus.GetInstanceNameForDisks(diskNameList, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup)
if err != nil {
return stacktrace.Propagate(err, "error fetching attached instances for disks")
}
// Get the instance name with attached disks
attachedDisksWithInstance := make(map[string]*[]compute.DataDisk)
for instanceName := range instanceNamesWithDiskNames {
attachedDisksWithInstance[instanceName], err = diskStatus.GetInstanceDiskList(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, experimentsDetails.ScaleSet, instanceName)
if err != nil {
return stacktrace.Propagate(err, "error fetching virtual disks")
}
}
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, attachedDisksWithInstance, instanceNamesWithDiskNames, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceNamesWithDiskNames, attachedDisksWithInstance, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceNamesWithDiskNames, attachedDisksWithInstance, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
}
return nil
}
// injectChaosInParallelMode will inject the Azure disk loss chaos in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesWithDiskNames map[string][]string, attachedDisksWithInstance map[string]*[]compute.DataDisk, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureDiskLossFaultInParallelMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure virtual disk"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// Detaching the virtual disks
log.Info("[Chaos]: Detaching the virtual disks from the instances")
for instanceName, diskNameList := range instanceNamesWithDiskNames {
if err = diskStatus.DetachDisks(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameList); err != nil {
return stacktrace.Propagate(err, "failed to detach disks")
}
}
// Waiting for disk to be detached
for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList {
log.Infof("[Wait]: Waiting for Disk '%v' to detach", diskName)
if err := diskStatus.WaitForDiskToDetach(experimentsDetails, diskName); err != nil {
return stacktrace.Propagate(err, "disk detachment check failed")
}
}
}
// Updating the result details
for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList {
common.SetTargets(diskName, "detached", "VirtualDisk", chaosDetails)
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos duration
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
//Attaching the virtual disks to the instance
log.Info("[Chaos]: Attaching the Virtual disks back to the instances")
for instanceName, diskNameList := range attachedDisksWithInstance {
if err = diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameList); err != nil {
return stacktrace.Propagate(err, "virtual disk attachment failed")
}
// Wait for disk to be attached
for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList {
log.Infof("[Wait]: Waiting for Disk '%v' to attach", diskName)
if err := diskStatus.WaitForDiskToAttach(experimentsDetails, diskName); err != nil {
return stacktrace.Propagate(err, "disk attachment check failed")
}
}
}
// Updating the result details
for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList {
common.SetTargets(diskName, "re-attached", "VirtualDisk", chaosDetails)
}
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
// injectChaosInSerialMode will inject the Azure disk loss chaos in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesWithDiskNames map[string][]string, attachedDisksWithInstance map[string]*[]compute.DataDisk, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureDiskLossFaultInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure virtual disks"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for instanceName, diskNameList := range instanceNamesWithDiskNames {
for i, diskName := range diskNameList {
// Converting diskName to list type because DetachDisks() accepts a list type
diskNameToList := []string{diskName}
// Detaching the virtual disks
log.Infof("[Chaos]: Detaching %v from the instance", diskName)
if err = diskStatus.DetachDisks(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameToList); err != nil {
return stacktrace.Propagate(err, "failed to detach disks")
}
// Waiting for disk to be detached
log.Infof("[Wait]: Waiting for Disk '%v' to detach", diskName)
if err := diskStatus.WaitForDiskToDetach(experimentsDetails, diskName); err != nil {
return stacktrace.Propagate(err, "disk detachment check failed")
}
common.SetTargets(diskName, "detached", "VirtualDisk", chaosDetails)
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos duration
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
//Attaching the virtual disks to the instance
log.Infof("[Chaos]: Attaching %v back to the instance", diskName)
if err = diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, attachedDisksWithInstance[instanceName]); err != nil {
return stacktrace.Propagate(err, "disk attachment failed")
}
// Waiting for disk to be attached
log.Infof("[Wait]: Waiting for Disk '%v' to attach", diskName)
if err := diskStatus.WaitForDiskToAttach(experimentsDetails, diskName); err != nil {
return stacktrace.Propagate(err, "disk attachment check failed")
}
common.SetTargets(diskName, "re-attached", "VirtualDisk", chaosDetails)
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
// abortWatcher will be watching for the abort signal and revert the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, attachedDisksWithInstance map[string]*[]compute.DataDisk, instanceNamesWithDiskNames map[string][]string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
log.Info("[Abort]: Attaching disk(s) as abort signal received")
for instanceName, diskList := range attachedDisksWithInstance {
// Checking for provisioning state of the vm instances
err = retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
status, err := instanceStatus.GetAzureInstanceProvisionStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet)
if err != nil {
return stacktrace.Propagate(err, "failed to get instance")
}
if status != "Provisioning succeeded" {
return stacktrace.Propagate(err, "instance is updating, waiting for instance to finish update")
}
return nil
})
if err != nil {
log.Errorf("[Error]: Instance is still in 'updating' state after timeout, re-attach might fail")
}
log.Infof("[Abort]: Attaching disk(s) to instance: %v", instanceName)
for _, disk := range *diskList {
diskStatusString, err := diskStatus.GetDiskStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, *disk.Name)
if err != nil {
log.Errorf("Failed to get disk status: %v", err)
}
if diskStatusString != "Attached" {
if err := diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskList); err != nil {
log.Errorf("Failed to attach disk, manual revert required: %v", err)
} else {
common.SetTargets(*disk.Name, "re-attached", "VirtualDisk", chaosDetails)
}
}
}
}
log.Infof("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -0,0 +1,293 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/azure/instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
azureCommon "github.com/litmuschaos/litmus-go/pkg/cloud/azure/common"
azureStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/instance"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareAzureStop will initialize instanceNameList and start chaos injection based on sequence method selected
func PrepareAzureStop(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAzureInstanceStopFault")
defer span.End()
// inject channel is used to transmit signal notifications
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// get the instance name or list of instance names
instanceNameList := strings.Split(experimentsDetails.AzureInstanceNames, ",")
if experimentsDetails.AzureInstanceNames == "" || len(instanceNameList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no instance name found to stop"}
}
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, instanceNameList)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceNameList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceNameList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the Azure instance termination in serial mode that is one after the other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureInstanceStopFaultInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceName list, %v", instanceNameList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// PowerOff the instance serially
for i, vmName := range instanceNameList {
// Stopping the Azure instance
log.Infof("[Chaos]: Stopping the Azure instance: %v", vmName)
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "unable to stop the Azure instance")
}
} else {
if err := azureStatus.AzureInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "unable to stop the Azure instance")
}
}
// Wait for Azure instance to completely stop
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the stopped state", vmName)
if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "instance poweroff status check failed")
}
// Run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
// Wait for Chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
// Starting the Azure instance
log.Info("[Chaos]: Starting back the Azure instance")
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "unable to start the Azure instance")
}
} else {
if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "unable to start the Azure instance")
}
}
// Wait for Azure instance to get in running state
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the running state", vmName)
if err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "instance power on status check failed")
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode will inject the Azure instance termination in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureInstanceStopFaultInParallelMode")
defer span.End()
select {
case <-inject:
// Stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceName list, %v", instanceNameList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// PowerOff the instances parallelly
for _, vmName := range instanceNameList {
// Stopping the Azure instance
log.Infof("[Chaos]: Stopping the Azure instance: %v", vmName)
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "unable to stop Azure instance")
}
} else {
if err := azureStatus.AzureInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "unable to stop Azure instance")
}
}
}
// Wait for all Azure instances to completely stop
for _, vmName := range instanceNameList {
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the stopped state", vmName)
if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "instance poweroff status check failed")
}
}
// Run probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
// Wait for Chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
// Starting the Azure instance
for _, vmName := range instanceNameList {
log.Infof("[Chaos]: Starting back the Azure instance: %v", vmName)
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "unable to start the Azure instance")
}
} else {
if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "unable to start the Azure instance")
}
}
}
// Wait for Azure instance to get in running state
for _, vmName := range instanceNameList {
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the running state", vmName)
if err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return stacktrace.Propagate(err, "instance power on status check failed")
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// watching for the abort signal and revert the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string) {
<-abort
var instanceState string
log.Info("[Abort]: Chaos Revert Started")
for _, vmName := range instanceNameList {
if experimentsDetails.ScaleSet == "enable" {
scaleSetName, vmId := azureCommon.GetScaleSetNameAndInstanceId(vmName)
instanceState, err = azureStatus.GetAzureScaleSetInstanceStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, scaleSetName, vmId)
} else {
instanceState, err = azureStatus.GetAzureInstanceStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName)
}
if err != nil {
log.Errorf("[Abort]: Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "VM running" && instanceState != "VM starting" {
log.Info("[Abort]: Waiting for the Azure instance to get down")
if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
log.Errorf("[Abort]: Instance power off status check failed: %v", err)
}
log.Info("[Abort]: Starting Azure instance as abort signal received")
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
log.Errorf("[Abort]: Unable to start the Azure instance: %v", err)
}
} else {
if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
log.Errorf("[Abort]: Unable to start the Azure instance: %v", err)
}
}
}
log.Info("[Abort]: Waiting for the Azure instance to start")
err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName)
if err != nil {
log.Errorf("[Abort]: Instance power on status check failed: %v", err)
log.Errorf("[Abort]: Azure instance %v failed to start after an abort signal is received", vmName)
}
}
log.Infof("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -1,220 +1,221 @@
package main
package helper
import (
"bytes"
"os"
"context"
"fmt"
"os/exec"
"strconv"
"strings"
"time"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/palantir/stacktrace"
"github.com/sirupsen/logrus"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentEnv "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/environment"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/openebs/maya/pkg/util/retry"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientTypes "k8s.io/apimachinery/pkg/types"
)
func main() {
var err error
// Helper injects the container-kill chaos
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulateContainerKillFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
clients := clients.ClientSets{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
//Getting kubeConfig and Generate ClientSets
if err := clients.GenerateClientSetFromKubeConfig(); err != nil {
log.Fatalf("Unable to Get the kubeconfig, err: %v", err)
}
resultDetails := types.ResultDetails{}
//Fetching all the ENV passed in the helper pod
log.Info("[PreReq]: Getting the ENV variables")
GetENV(&experimentsDetails, "container-kill")
getENV(&experimentsDetails)
// Intialise the chaos attributes
experimentEnv.InitialiseChaosVariables(&chaosDetails, &experimentsDetails)
// Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
err := KillContainer(&experimentsDetails, clients, &eventsDetails, &chaosDetails)
if err != nil {
// Initialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
if err := killContainer(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
// KillContainer kill the random application container
// killContainer kill the random application container
// it will kill the container till the chaos duration
// the execution will stop after timestamp passes the given chaos duration
func KillContainer(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func killContainer(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not parse targets")
}
// getting the current timestamp, it will help to kepp track the total chaos duration
ChaosStartTimeStamp := time.Now().Unix()
var targets []targetDetails
for iteration := 0; iteration < experimentsDetails.Iterations; iteration++ {
//GetRestartCount return the restart count of target container
restartCountBefore, err := GetRestartCount(experimentsDetails, experimentsDetails.TargetPods, clients)
if err != nil {
return err
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
targets = append(targets, td)
log.Infof("Injecting chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
}
//Obtain the container ID through Pod
// this id will be used to select the container for the kill
containerID, err := GetContainerID(experimentsDetails, clients)
if err != nil {
return errors.Errorf("Unable to get the container id, %v", err)
}
if err := killIterations(targets, experimentsDetails, clients, eventsDetails, chaosDetails, resultDetails); err != nil {
return err
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": experimentsDetails.TargetPods,
"ContainerName": experimentsDetails.TargetContainer,
"RestartCountBefore": restartCountBefore,
})
log.Infof("[Completion]: %v chaos has been completed", experimentsDetails.ExperimentName)
return nil
}
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngne")
}
func killIterations(targets []targetDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
switch experimentsDetails.ContainerRuntime {
case "docker":
if err := StopDockerContainer(containerID, experimentsDetails.SocketPath, experimentsDetails.Signal); err != nil {
return err
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
var containerIds []string
for _, t := range targets {
t.RestartCountBefore, err = getRestartCount(t, clients)
if err != nil {
return stacktrace.Propagate(err, "could get container restart count")
}
case "containerd", "crio":
if err := StopContainerdContainer(containerID, experimentsDetails.SocketPath); err != nil {
return err
containerId, err := common.GetContainerID(t.Namespace, t.Name, t.TargetContainer, clients, t.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
default:
return errors.Errorf("%v container runtime not supported", experimentsDetails.ContainerRuntime)
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": t.Name,
"ContainerName": t.TargetContainer,
"RestartCountBefore": t.RestartCountBefore,
})
containerIds = append(containerIds, containerId)
}
if err := kill(experimentsDetails, containerIds, clients, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not kill target container")
}
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaosInterval != 0 {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaosInterval)
waitForChaosInterval(experimentsDetails)
common.WaitForDuration(experimentsDetails.ChaosInterval)
}
//Check the status of restarted container
err = CheckContainerStatus(experimentsDetails, clients, experimentsDetails.TargetPods)
if err != nil {
return errors.Errorf("Application container is not in running state, %v", err)
}
// It will verify that the restart count of container should increase after chaos injection
err = VerifyRestartCount(experimentsDetails, experimentsDetails.TargetPods, clients, restartCountBefore)
if err != nil {
return err
}
// generating the total duration of the experiment run
ChaosCurrentTimeStamp := time.Now().Unix()
chaosDiffTimeStamp := ChaosCurrentTimeStamp - ChaosStartTimeStamp
// terminating the execution after the timestamp exceed the total chaos duration
if int(chaosDiffTimeStamp) >= experimentsDetails.ChaosDuration {
break
}
}
log.Infof("[Completion]: %v chaos has been completed", experimentsDetails.ExperimentName)
return nil
}
//GetContainerID derive the container id of the application container
func GetContainerID(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (string, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(experimentsDetails.TargetPods, v1.GetOptions{})
if err != nil {
return "", err
}
var containerID string
// filtering out the container id from the details of containers inside containerStatuses of the given pod
// container id is present in the form of <runtime>://<container-id>
for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentsDetails.TargetContainer {
containerID = strings.Split(container.ContainerID, "//")[1]
break
}
}
log.Infof("container ID of app container under test: %v", containerID)
return containerID, nil
}
//StopContainerdContainer kill the application container
func StopContainerdContainer(containerID, socketPath string) error {
var errOut bytes.Buffer
endpoint := "unix://" + socketPath
cmd := exec.Command("crictl", "-i", endpoint, "-r", endpoint, "stop", string(containerID))
cmd.Stderr = &errOut
if err := cmd.Run(); err != nil {
return errors.Errorf("Unable to run command, err: %v; error output: %v", err, errOut.String())
}
return nil
}
//StopDockerContainer kill the application container
func StopDockerContainer(containerID, socketPath, signal string) error {
var errOut bytes.Buffer
host := "unix://" + socketPath
cmd := exec.Command("docker", "--host", host, "kill", string(containerID), "--signal", signal)
cmd.Stderr = &errOut
if err := cmd.Run(); err != nil {
return errors.Errorf("Unable to run command, err: %v; error output: %v", err, errOut.String())
}
return nil
}
// CheckContainerStatus checks the status of the application container
func CheckContainerStatus(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appName string) error {
err := retry.
Times(90).
Wait(2 * time.Second).
Try(func(attempt uint) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(appName, v1.GetOptions{})
if err != nil {
return errors.Errorf("Unable to find the pod with name %v, err: %v", appName, err)
for _, t := range targets {
if err := validate(t, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not verify restart count")
}
for _, container := range pod.Status.ContainerStatuses {
if container.Ready != true {
return errors.Errorf("containers are not yet in running state")
}
log.InfoWithValues("The running status of container are as follows", logrus.Fields{
"container": container.Name, "Pod": pod.Name, "Status": pod.Status.Phase})
if err := result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "targeted", "pod", t.Name); err != nil {
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
return nil
})
if err != nil {
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
func kill(experimentsDetails *experimentTypes.ExperimentDetails, containerIds []string, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
switch experimentsDetails.ContainerRuntime {
case "docker":
if err := stopDockerContainer(containerIds, experimentsDetails.SocketPath, experimentsDetails.Signal, experimentsDetails.ChaosPodName); err != nil {
if isContextDeadlineExceeded(err) {
return nil
}
return stacktrace.Propagate(err, "could not stop container")
}
case "containerd", "crio":
if err := stopContainerdContainer(containerIds, experimentsDetails.SocketPath, experimentsDetails.Signal, experimentsDetails.ChaosPodName, experimentsDetails.Timeout); err != nil {
if isContextDeadlineExceeded(err) {
return nil
}
return stacktrace.Propagate(err, "could not stop container")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: fmt.Sprintf("unsupported container runtime %s", experimentsDetails.ContainerRuntime)}
}
return nil
}
func validate(t targetDetails, timeout, delay int, clients clients.ClientSets) error {
//Check the status of restarted container
if err := common.CheckContainerStatus(t.Namespace, t.Name, timeout, delay, clients, t.Source); err != nil {
return err
}
return nil
// It will verify that the restart count of container should increase after chaos injection
return verifyRestartCount(t, timeout, delay, clients, t.RestartCountBefore)
}
//waitForChaosInterval waits for the given ramp time duration (in seconds)
func waitForChaosInterval(experimentsDetails *experimentTypes.ExperimentDetails) {
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
}
//GetRestartCount return the restart count of target container
func GetRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets) (int, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(podName, v1.GetOptions{})
if err != nil {
return 0, err
// stopContainerdContainer kill the application container
func stopContainerdContainer(containerIDs []string, socketPath, signal, source string, timeout int) error {
if signal != "SIGKILL" && signal != "SIGTERM" {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: source, Reason: fmt.Sprintf("unsupported signal %s, use either SIGTERM or SIGKILL", signal)}
}
cmd := exec.Command("sudo", "crictl", "-i", fmt.Sprintf("unix://%s", socketPath), "-r", fmt.Sprintf("unix://%s", socketPath), "stop")
if signal == "SIGKILL" {
cmd.Args = append(cmd.Args, "--timeout=0")
} else if timeout != -1 {
cmd.Args = append(cmd.Args, fmt.Sprintf("--timeout=%v", timeout))
}
cmd.Args = append(cmd.Args, containerIDs...)
return common.RunCLICommands(cmd, source, "", "failed to stop container", cerrors.ErrorTypeChaosInject)
}
// stopDockerContainer kill the application container
func stopDockerContainer(containerIDs []string, socketPath, signal, source string) error {
cmd := exec.Command("sudo", "docker", "--host", fmt.Sprintf("unix://%s", socketPath), "kill", "--signal", signal)
cmd.Args = append(cmd.Args, containerIDs...)
return common.RunCLICommands(cmd, source, "", "failed to stop container", cerrors.ErrorTypeChaosInject)
}
// getRestartCount return the restart count of target container
func getRestartCount(target targetDetails, clients clients.ClientSets) (int, error) {
pod, err := clients.GetPod(target.Namespace, target.Name, 180, 2)
if err != nil {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: target.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", target.Name, target.Namespace), Reason: err.Error()}
}
restartCount := 0
for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentsDetails.TargetContainer {
if container.Name == target.TargetContainer {
restartCount = int(container.RestartCount)
break
}
@ -222,60 +223,58 @@ func GetRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, podN
return restartCount, nil
}
//VerifyRestartCount verify the restart count of target container that it is restarted or not after chaos injection
func VerifyRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets, restartCountBefore int) error {
// verifyRestartCount verify the restart count of target container that it is restarted or not after chaos injection
func verifyRestartCount(t targetDetails, timeout, delay int, clients clients.ClientSets, restartCountBefore int) error {
restartCountAfter := 0
err := retry.
Times(90).
Wait(1 * time.Second).
return retry.
Times(uint(timeout / delay)).
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(podName, v1.GetOptions{})
pod, err := clients.KubeClient.CoreV1().Pods(t.Namespace).Get(context.Background(), t.Name, v1.GetOptions{})
if err != nil {
return errors.Errorf("Unable to find the pod with name %v, err: %v", podName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
}
for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentsDetails.TargetContainer {
if container.Name == t.TargetContainer {
restartCountAfter = int(container.RestartCount)
break
}
}
if restartCountAfter <= restartCountBefore {
return errors.Errorf("Target container is not restarted")
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: "target container is not restarted after kill"}
}
log.Infof("restartCount of target container after chaos injection: %v", strconv.Itoa(restartCountAfter))
return nil
})
log.Infof("restartCount of target container after chaos injection: %v", strconv.Itoa(restartCountAfter))
return err
}
//GetENV fetches all the env variables from the runner pod
func GetENV(experimentDetails *experimentTypes.ExperimentDetails, name string) {
experimentDetails.ExperimentName = name
experimentDetails.AppNS = Getenv("APP_NS", "")
experimentDetails.TargetContainer = Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = Getenv("APP_POD", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.Iterations, _ = strconv.Atoi(Getenv("ITERATIONS", "3"))
experimentDetails.ChaosNamespace = Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = Getenv("CHAOS_ENGINE", "")
experimentDetails.AppLabel = Getenv("APP_LABEL", "")
experimentDetails.ChaosUID = clientTypes.UID(Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = Getenv("POD_NAME", "")
experimentDetails.SocketPath = Getenv("SOCKET_PATH", "")
experimentDetails.ContainerRuntime = Getenv("CONTAINER_RUNTIME", "")
experimentDetails.Signal = Getenv("SIGNAL", "SIGKILL")
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.Signal = types.Getenv("SIGNAL", "SIGKILL")
experimentDetails.Delay, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_DELAY", "2"))
experimentDetails.Timeout, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_TIMEOUT", "180"))
experimentDetails.ContainerAPITimeout, _ = strconv.Atoi(types.Getenv("CONTAINER_API_TIMEOUT", "-1"))
}
// Getenv fetch the env and set the default value, if any
func Getenv(key string, defaultValue string) string {
value := os.Getenv(key)
if value == "" {
value = defaultValue
}
return value
type targetDetails struct {
Name string
Namespace string
TargetContainer string
RestartCountBefore int
Source string
}
func isContextDeadlineExceeded(err error) bool {
return strings.Contains(err.Error(), "context deadline exceeded")
}

View File

@ -1,36 +1,52 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/math"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareContainerKill contains the prepration steps before chaos injection
func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareContainerKill contains the preparation steps before chaos injection
func PrepareContainerKill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareContainerKillFault")
defer span.End()
var err error
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
//Set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
log.InfoWithValues("[Info]: The tunables are:", logrus.Fields{
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -40,44 +56,30 @@ func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails,
// Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" {
err = GetServiceAccount(experimentsDetails, clients)
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return errors.Errorf("Unable to get the serviceAccountName, err: %v", err)
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = GetTargetContainer(experimentsDetails, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("Unable to get the target container name, err: %v", err)
}
}
//Getting the iteration count for the container-kill
GetIterations(experimentsDetails)
if experimentsDetails.EngineName != "" {
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("Unable to get annotations, err: %v", err)
}
// Get Resource Requirements
experimentsDetails.Resources, err = common.GetChaosPodResourceRequirements(experimentsDetails.ChaosPodName, experimentsDetails.ExperimentName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("Unable to get resource requirements, err: %v", err)
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails); err != nil {
return err
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
} else {
if err = InjectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails); err != nil {
return err
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -88,156 +90,98 @@ func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails,
return nil
}
// InjectChaosInSerialMode kill the container of all target application serially (one by one)
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// injectChaosInSerialMode kill the container of all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectContainerKillFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
labelSuffix := common.GetRunID()
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-" + runID
runID := stringutils.GetRunID()
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
err := status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.ChaosInterval+60, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed, err: %v", err)
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pods")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
}
// InjectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
labelSuffix := common.GetRunID()
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectContainerKillFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
err := status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.ChaosInterval+60, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed, err: %v", err)
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pods")
err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
return nil
}
//GetIterations derive the iterations value from given parameters
func GetIterations(experimentsDetails *experimentTypes.ExperimentDetails) {
var Iterations int
if experimentsDetails.ChaosInterval != 0 {
Iterations = experimentsDetails.ChaosDuration / experimentsDetails.ChaosInterval
}
experimentsDetails.Iterations = math.Maximum(Iterations, 1)
}
// GetServiceAccount find the serviceAccountName for the helper pod
func GetServiceAccount(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Get(experimentsDetails.ChaosPodName, v1.GetOptions{})
if err != nil {
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
experimentsDetails.ChaosServiceAccount = pod.Spec.ServiceAccountName
return nil
}
//GetTargetContainer will fetch the container name from application pod
//This container will be used as target container
func GetTargetContainer(experimentsDetails *experimentTypes.ExperimentDetails, appName string, clients clients.ClientSets) (string, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(appName, v1.GetOptions{})
if err != nil {
return "", err
}
return pod.Spec.Containers[0].Name, nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, podName, nodeName, runID, labelSuffix string) error {
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateContainerKillFaultHelperPod")
defer span.End()
privilegedEnable := false
if experimentsDetails.ContainerRuntime == "crio" {
privilegedEnable = true
}
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper-" + labelSuffix,
"name": experimentsDetails.ExperimentName + "-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
},
Annotations: experimentsDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
Volumes: []apiv1.Volume{
{
Name: "cri-socket",
@ -248,26 +192,6 @@ func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
},
},
InitContainers: []apiv1.Container{
{
Name: "setup-" + experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
"-c",
"sudo chmod 777 " + experimentsDetails.SocketPath,
},
Resources: experimentsDetails.Resources,
Env: GetPodEnv(experimentsDetails, podName),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
MountPath: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
@ -278,10 +202,10 @@ func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
Args: []string{
"-c",
"./helper/container-killer",
"./helpers -name container-kill",
},
Resources: experimentsDetails.Resources,
Env: GetPodEnv(experimentsDetails, podName),
Resources: chaosDetails.Resources,
Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
@ -296,52 +220,46 @@ func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// GetPodEnv derive all the env required for the helper pod
func GetPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName string) []apiv1.EnvVar {
// getPodEnv derive all the env required for the helper pod
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envVar []apiv1.EnvVar
ENVList := map[string]string{
"APP_NS": experimentsDetails.AppNS,
"APP_POD": podName,
"APP_CONTAINER": experimentsDetails.TargetContainer,
"TOTAL_CHAOS_DURATION": strconv.Itoa(experimentsDetails.ChaosDuration),
"CHAOS_NAMESPACE": experimentsDetails.ChaosNamespace,
"CHAOS_ENGINE": experimentsDetails.EngineName,
"CHAOS_UID": string(experimentsDetails.ChaosUID),
"CHAOS_INTERVAL": strconv.Itoa(experimentsDetails.ChaosInterval),
"ITERATIONS": strconv.Itoa(experimentsDetails.Iterations),
"SOCKET_PATH": experimentsDetails.SocketPath,
"CONTAINER_RUNTIME": experimentsDetails.ContainerRuntime,
"SIGNAL": experimentsDetails.Signal,
}
for key, value := range ENVList {
var perEnv apiv1.EnvVar
perEnv.Name = key
perEnv.Value = value
envVar = append(envVar, perEnv)
}
// Getting experiment pod name from downward API
experimentPodName := GetValueFromDownwardAPI("v1", "metadata.name")
var downwardEnv apiv1.EnvVar
downwardEnv.Name = "POD_NAME"
downwardEnv.ValueFrom = &experimentPodName
envVar = append(envVar, downwardEnv)
var envDetails common.ENVDetails
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
SetEnv("CHAOS_UID", string(experimentsDetails.ChaosUID)).
SetEnv("CHAOS_INTERVAL", strconv.Itoa(experimentsDetails.ChaosInterval)).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("SIGNAL", experimentsDetails.Signal).
SetEnv("STATUS_CHECK_DELAY", strconv.Itoa(experimentsDetails.Delay)).
SetEnv("STATUS_CHECK_TIMEOUT", strconv.Itoa(experimentsDetails.Timeout)).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("CONTAINER_API_TIMEOUT", strconv.Itoa(experimentsDetails.ContainerAPITimeout)).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envVar
return envDetails.ENV
}
// GetValueFromDownwardAPI returns the value from downwardApi
func GetValueFromDownwardAPI(apiVersion string, fieldPath string) apiv1.EnvVarSource {
downwardENV := apiv1.EnvVarSource{
FieldRef: &apiv1.ObjectFieldSelector{
APIVersion: apiVersion,
FieldPath: fieldPath,
},
}
return downwardENV
// SetChaosTunables will setup a random value within a given range of values
// If the value is not provided in range it'll setup the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -1,6 +1,7 @@
package main
package helper
import (
"context"
"fmt"
"os"
"os/exec"
@ -10,39 +11,53 @@ import (
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentEnv "github.com/litmuschaos/litmus-go/pkg/generic/disk-fill/environment"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/disk-fill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/sirupsen/logrus"
"k8s.io/apimachinery/pkg/api/resource"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientTypes "k8s.io/apimachinery/pkg/types"
)
func main() {
var inject, abort chan os.Signal
// Helper injects the disk-fill chaos
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulateDiskFillFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
clients := clients.ClientSets{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
resultDetails := types.ResultDetails{}
//Getting kubeConfig and Generate ClientSets
if err := clients.GenerateClientSetFromKubeConfig(); err != nil {
log.Fatalf("Unable to Get the kubeconfig, err: %v", err)
}
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Fetching all the ENV passed in the helper pod
log.Info("[PreReq]: Getting the ENV variables")
GetENV(&experimentsDetails, "disk-fill")
getENV(&experimentsDetails)
// Intialise the chaos attributes
experimentEnv.InitialiseChaosVariables(&chaosDetails, &experimentsDetails)
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
@ -50,57 +65,60 @@ func main() {
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
err := DiskFill(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails)
if err != nil {
if err := diskFill(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
//DiskFill contains steps to inject disk-fill chaos
func DiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
// GetEphemeralStorageAttributes derive the ephemeral storage attributes from the target container
ephemeralStorageLimit, ephemeralStorageRequest, err := GetEphemeralStorageAttributes(experimentsDetails, clients)
// diskFill contains steps to inject disk-fill chaos
func diskFill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return err
return stacktrace.Propagate(err, "could not parse targets")
}
// Derive the container id of the target container
containerID, err := GetContainerID(experimentsDetails, clients)
if err != nil {
return err
var targets []targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
// Derive the container id of the target container
td.ContainerId, err = common.GetContainerID(td.Namespace, td.Name, td.TargetContainer, clients, chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the pid of the target container
td.TargetPID, err = common.GetPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return err
}
td.SizeToFill, err = getDiskSizeToFill(td, experimentsDetails, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get disk size to fill")
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": td.Name,
"Namespace": td.Namespace,
"SizeToFill(KB)": td.SizeToFill,
"TargetContainer": td.TargetContainer,
})
targets = append(targets, td)
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": experimentsDetails.TargetPods,
"ContainerName": experimentsDetails.TargetContainer,
"ephemeralStorageLimit": ephemeralStorageLimit,
"ephemeralStorageRequest": ephemeralStorageRequest,
"ContainerID": containerID,
})
// derive the used ephemeral storage size from the target container
du := fmt.Sprintf("sudo du /diskfill/%v", containerID)
cmd := exec.Command("/bin/bash", "-c", du)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(string(out))
return err
}
ephemeralStorageDetails := string(out)
// filtering out the used ephemeral storage from the output of du command
usedEphemeralStorageSize, err := FilterUsedEphemeralStorage(ephemeralStorageDetails)
if err != nil {
return errors.Errorf("Unable to filter used ephemeral storage size, err: %v", err)
}
log.Infof("used ephemeral storage space: %v", strconv.Itoa(usedEphemeralStorageSize))
// deriving the ephemeral storage size to be filled
sizeTobeFilled := GetSizeToBeFilled(experimentsDetails, usedEphemeralStorageSize, int(ephemeralStorageLimit))
log.Infof("ephemeral storage size to be filled: %v", strconv.Itoa(sizeTobeFilled))
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
@ -108,193 +126,249 @@ func DiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clients cli
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if sizeTobeFilled > 0 {
// watching for the abort signal and revert the chaos
go abortWatcher(targets, experimentsDetails, clients, resultDetails.Name)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
// Creating files to fill the required ephemeral storage size of block size of 4K
dd := fmt.Sprintf("sudo dd if=/dev/urandom of=/diskfill/%v/diskfill bs=4K count=%v", containerID, strconv.Itoa(sizeTobeFilled/4))
cmd := exec.Command("/bin/bash", "-c", dd)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(string(out))
return err
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
loop:
for {
endTime = time.After(timeDelay)
select {
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
// updating the chaosresult after stopped
failStep := "Disk Fill Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
// generating summary event in chaosengine
msg := experimentsDetails.ExperimentName + " experiment has been aborted"
types.SetEngineEventAttributes(eventsDetails, types.Summary, msg, "Warning", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
// generating summary event in chaosresult
types.SetResultEventAttributes(eventsDetails, types.StoppedVerdict, msg, "Warning", resultDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosResult")
err = Remedy(experimentsDetails, clients, containerID)
if err != nil {
return errors.Errorf("Unable to perform remedy operation due to %v", err)
}
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
for _, t := range targets {
if t.SizeToFill > 0 {
if err := fillDisk(t, experimentsDetails.DataBlockSize); err != nil {
return stacktrace.Propagate(err, "could not fill ephemeral storage")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertDiskFill(t, clients); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
} else {
log.Warn("No required free space found!")
}
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Stopping the experiment")
var errList []string
for _, t := range targets {
// It will delete the target pod if target pod is evicted
// if target pod is still running then it will delete all the files, which was created earlier during chaos execution
err = Remedy(experimentsDetails, clients, containerID)
if err != nil {
return errors.Errorf("Unable to perform remedy operation due to %v", err)
if err = revertDiskFill(t, clients); err != nil {
errList = append(errList, err.Error())
continue
}
} else {
log.Warn("No required free space found!, It's Housefull")
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// GetEphemeralStorageAttributes derive the ephemeral storage attributes from the target pod
func GetEphemeralStorageAttributes(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (int64, int64, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(experimentsDetails.TargetPods, v1.GetOptions{})
// fillDisk fill the ephemeral disk by creating files
func fillDisk(t targetDetails, bs int) error {
// Creating files to fill the required ephemeral storage size of block size of 4K
log.Infof("[Fill]: Filling ephemeral storage, size: %vKB", t.SizeToFill)
dd := fmt.Sprintf("sudo dd if=/dev/urandom of=/proc/%v/root/home/diskfill bs=%vK count=%v", t.TargetPID, bs, strconv.Itoa(t.SizeToFill/bs))
log.Infof("dd: {%v}", dd)
cmd := exec.Command("/bin/bash", "-c", dd)
out, err := cmd.CombinedOutput()
if err != nil {
return 0, 0, err
log.Error(err.Error())
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: string(out)}
}
return nil
}
// getEphemeralStorageAttributes derive the ephemeral storage attributes from the target pod
func getEphemeralStorageAttributes(t targetDetails, clients clients.ClientSets) (int64, error) {
pod, err := clients.GetPod(t.Namespace, t.Name, 180, 2)
if err != nil {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
}
var ephemeralStorageLimit, ephemeralStorageRequest int64
var ephemeralStorageLimit int64
containers := pod.Spec.Containers
// Extracting ephemeral storage limit & requested value from the target container
// It will be in the form of Kb
for _, container := range containers {
if container.Name == experimentsDetails.TargetContainer {
if container.Name == t.TargetContainer {
ephemeralStorageLimit = container.Resources.Limits.StorageEphemeral().ToDec().ScaledValue(resource.Kilo)
ephemeralStorageRequest = container.Resources.Requests.StorageEphemeral().ToDec().ScaledValue(resource.Kilo)
break
}
}
if ephemeralStorageRequest == 0 || ephemeralStorageLimit == 0 {
return 0, 0, fmt.Errorf("No Ephemeral storage details found inside %v container", experimentsDetails.TargetContainer)
}
return ephemeralStorageLimit, ephemeralStorageRequest, nil
return ephemeralStorageLimit, nil
}
// GetContainerID derive the container id of the target container
func GetContainerID(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (string, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(experimentsDetails.TargetPods, v1.GetOptions{})
if err != nil {
return "", err
}
var containerID string
containers := pod.Status.ContainerStatuses
// filtering out the container id from the details of containers inside containerStatuses of the given pod
// container id is present in the form of <runtime>://<container-id>
for _, container := range containers {
if container.Name == experimentsDetails.TargetContainer {
containerID = strings.Split(container.ContainerID, "//")[1]
break
}
}
return containerID, nil
}
// FilterUsedEphemeralStorage filter out the used ephemeral storage from the given string
func FilterUsedEphemeralStorage(ephemeralStorageDetails string) (int, error) {
// filterUsedEphemeralStorage filter out the used ephemeral storage from the given string
func filterUsedEphemeralStorage(ephemeralStorageDetails string) (int, error) {
// Filtering out the ephemeral storage size from the output of du command
// It contains details of all subdirectories of target container
ephemeralStorageAll := strings.Split(ephemeralStorageDetails, "\n")
// It will return the details of main directory
ephemeralStorageAllDiskFill := strings.Split(ephemeralStorageAll[len(ephemeralStorageAll)-2], "\t")[0]
// type casting string to interger
// type casting string to integer
ephemeralStorageSize, err := strconv.Atoi(ephemeralStorageAllDiskFill)
return ephemeralStorageSize, err
}
// GetSizeToBeFilled generate the ephemeral storage size need to be filled
func GetSizeToBeFilled(experimentsDetails *experimentTypes.ExperimentDetails, usedEphemeralStorageSize int, ephemeralStorageLimit int) int {
// getSizeToBeFilled generate the ephemeral storage size need to be filled
func getSizeToBeFilled(experimentsDetails *experimentTypes.ExperimentDetails, usedEphemeralStorageSize int, ephemeralStorageLimit int) int {
var requirementToBeFill int
switch ephemeralStorageLimit {
case 0:
ephemeralStorageMebibytes, _ := strconv.Atoi(experimentsDetails.EphemeralStorageMebibytes)
requirementToBeFill = ephemeralStorageMebibytes * 1024
default:
// deriving size need to be filled from the used size & requirement size to fill
fillPercentage, _ := strconv.Atoi(experimentsDetails.FillPercentage)
requirementToBeFill = (ephemeralStorageLimit * fillPercentage) / 100
}
// deriving size need to be filled from the used size & requirement size to fill
requirementToBeFill := (ephemeralStorageLimit * experimentsDetails.FillPercentage) / 100
needToBeFilled := requirementToBeFill - usedEphemeralStorageSize
return needToBeFilled
}
// Remedy will delete the target pod if target pod is evicted
// revertDiskFill will delete the target pod if target pod is evicted
// if target pod is still running then it will delete the files, which was created during chaos execution
func Remedy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, containerID string) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(experimentsDetails.TargetPods, v1.GetOptions{})
func revertDiskFill(t targetDetails, clients clients.ClientSets) error {
pod, err := clients.GetPod(t.Namespace, t.Name, 180, 2)
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s,namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
}
// Deleting the pod as pod is already evicted
podReason := pod.Status.Reason
if podReason == "Evicted" {
if err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(experimentsDetails.TargetPods, &v1.DeleteOptions{}); err != nil {
return err
// Deleting the pod as pod is already evicted
log.Warn("Target pod is evicted, deleting the pod")
if err := clients.KubeClient.CoreV1().Pods(t.Namespace).Delete(context.Background(), t.Name, v1.DeleteOptions{}); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s,namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("failed to delete target pod after eviction :%s", err.Error())}
}
} else {
// deleting the files after chaos execution
rm := fmt.Sprintf("sudo rm -rf /diskfill/%v/diskfill", containerID)
rm := fmt.Sprintf("sudo rm -rf /proc/%v/root/home/diskfill", t.TargetPID)
cmd := exec.Command("/bin/bash", "-c", rm)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(string(out))
return err
log.Error(err.Error())
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s,namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("failed to cleanup ephemeral storage: %s", string(out))}
}
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
return nil
}
//GetENV fetches all the env variables from the runner pod
func GetENV(experimentDetails *experimentTypes.ExperimentDetails, name string) {
experimentDetails.ExperimentName = name
experimentDetails.AppNS = Getenv("APP_NS", "")
experimentDetails.TargetContainer = Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = Getenv("APP_POD", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosNamespace = Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = Getenv("CHAOS_ENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = Getenv("POD_NAME", "")
experimentDetails.FillPercentage, _ = strconv.Atoi(Getenv("FILL_PERCENTAGE", ""))
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.FillPercentage = types.Getenv("FILL_PERCENTAGE", "")
experimentDetails.EphemeralStorageMebibytes = types.Getenv("EPHEMERAL_STORAGE_MEBIBYTES", "")
experimentDetails.DataBlockSize, _ = strconv.Atoi(types.Getenv("DATA_BLOCK_SIZE", "256"))
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
}
// Getenv fetch the env and set the default value, if any
func Getenv(key string, defaultValue string) string {
value := os.Getenv(key)
if value == "" {
value = defaultValue
// abortWatcher continuously watch for the abort signals
func abortWatcher(targets []targetDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultName string) {
// waiting till the abort signal received
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
for _, t := range targets {
err := revertDiskFill(t, clients)
if err != nil {
log.Errorf("unable to kill disk-fill process, err :%v", err)
continue
}
if err = result.AnnotateChaosResult(resultName, experimentsDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
}
retry--
time.Sleep(1 * time.Second)
}
return value
log.Info("Chaos Revert Completed")
os.Exit(1)
}
func getDiskSizeToFill(t targetDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (int, error) {
usedEphemeralStorageSize, err := getUsedEphemeralStorage(t)
if err != nil {
return 0, stacktrace.Propagate(err, "could not get used ephemeral storage")
}
// GetEphemeralStorageAttributes derive the ephemeral storage attributes from the target container
ephemeralStorageLimit, err := getEphemeralStorageAttributes(t, clients)
if err != nil {
return 0, stacktrace.Propagate(err, "could not get ephemeral storage attributes")
}
if ephemeralStorageLimit == 0 && experimentsDetails.EphemeralStorageMebibytes == "0" {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: "either provide ephemeral storage limit inside target container or define EPHEMERAL_STORAGE_MEBIBYTES ENV"}
}
// deriving the ephemeral storage size to be filled
sizeTobeFilled := getSizeToBeFilled(experimentsDetails, usedEphemeralStorageSize, int(ephemeralStorageLimit))
return sizeTobeFilled, nil
}
func getUsedEphemeralStorage(t targetDetails) (int, error) {
// derive the used ephemeral storage size from the target container
du := fmt.Sprintf("sudo du /proc/%v/root", t.TargetPID)
cmd := exec.Command("/bin/bash", "-c", du)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(err.Error())
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("failed to get used ephemeral storage size: %s", string(out))}
}
ephemeralStorageDetails := string(out)
// filtering out the used ephemeral storage from the output of du command
usedEphemeralStorageSize, err := filterUsedEphemeralStorage(ephemeralStorageDetails)
if err != nil {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("failed to get used ephemeral storage size: %s", err.Error())}
}
log.Infof("used ephemeral storage space: %vKB", strconv.Itoa(usedEphemeralStorageSize))
return usedEphemeralStorageSize, nil
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
ContainerId string
SizeToFill int
TargetPID int
Source string
}

View File

@ -1,38 +1,57 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/disk-fill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareDiskFill contains the prepration steps before chaos injection
func PrepareDiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareDiskFill contains the preparation steps before chaos injection
func PrepareDiskFill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareDiskFillFault")
defer span.End()
// It will contains all the pod & container details required for exec command
var err error
// It will contain all the pod & container details required for exec command
execCommandDetails := exec.PodDetails{}
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
//set up the tunables if provided in range
setChaosTunables(experimentsDetails)
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"FillPercentage": experimentsDetails.FillPercentage,
"EphemeralStorageMebibytes": experimentsDetails.EphemeralStorageMebibytes,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -40,43 +59,32 @@ func PrepareDiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clie
common.WaitForDuration(experimentsDetails.RampTime)
}
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = GetTargetContainer(experimentsDetails, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("Unable to get the target container name, err: %v", err)
}
}
// Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" {
err = GetServiceAccount(experimentsDetails, clients)
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return errors.Errorf("Unable to get the serviceAccountName, err: %v", err)
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// Get Resource Requirements
experimentsDetails.Resources, err = common.GetChaosPodResourceRequirements(experimentsDetails.ChaosPodName, experimentsDetails.ExperimentName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("Unable to get resource requirements, err: %v", err)
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails); err != nil {
return err
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
} else {
if err = InjectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails); err != nil {
return err
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -87,42 +95,34 @@ func PrepareDiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clie
return nil
}
// InjectChaosInSerialMode fill the ephemeral storage of all target application serially (one by one)
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails) error {
// injectChaosInSerialMode fill the ephemeral storage of all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectDiskFillFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
labelSuffix := common.GetRunID()
// creating the helper pod to perform disk-fill chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
err := CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID, labelSuffix)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-" + runID
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+60, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//Deleting all the helper pod for disk-fill chaos
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
@ -130,97 +130,70 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// InjectChaosInParallelMode fill the ephemeral storage of of all target application in parallel mode (all at once)
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails) error {
// injectChaosInParallelMode fill the ephemeral storage of of all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectDiskFillFaultInParallelMode")
defer span.End()
labelSuffix := common.GetRunID()
// creating the helper pod to perform disk-fill chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
err := CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID, labelSuffix)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
err := status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+60, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//Deleting all the helper pod for disk-fill chaos
log.Info("[Cleanup]: Deleting all the helper pod")
err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, %v", err)
}
return nil
}
// GetServiceAccount find the serviceAccountName for the helper pod
func GetServiceAccount(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Get(experimentsDetails.ChaosPodName, v1.GetOptions{})
if err != nil {
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
experimentsDetails.ChaosServiceAccount = pod.Spec.ServiceAccountName
return nil
}
//GetTargetContainer will fetch the container name from application pod
// It will return the first container name from the application pod
func GetTargetContainer(experimentsDetails *experimentTypes.ExperimentDetails, appName string, clients clients.ClientSets) (string, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(appName, v1.GetOptions{})
if err != nil {
return "", err
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, appNodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateDiskFillFaultHelperPod")
defer span.End()
return pod.Spec.Containers[0].Name, nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appName, appNodeName, runID, labelSuffix string) error {
mountPropagationMode := apiv1.MountPropagationHostToContainer
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper-" + labelSuffix,
"name": experimentsDetails.ExperimentName + "-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
},
Annotations: experimentsDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNodeName,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
HostPID: true,
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
Volumes: []apiv1.Volume{
{
Name: "udev",
Name: "socket-path",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.ContainerPath,
Path: experimentsDetails.SocketPath,
},
},
},
@ -235,64 +208,65 @@ func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
Args: []string{
"-c",
"./helper/disk-fill",
"./helpers -name disk-fill",
},
Resources: experimentsDetails.Resources,
Env: GetPodEnv(experimentsDetails, appName),
Resources: chaosDetails.Resources,
Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "udev",
MountPath: "/diskfill",
MountPropagation: &mountPropagationMode,
Name: "socket-path",
MountPath: experimentsDetails.SocketPath,
},
},
SecurityContext: &apiv1.SecurityContext{
Privileged: &privilegedEnable,
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// GetPodEnv derive all the env required for the helper pod
func GetPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName string) []apiv1.EnvVar {
// getPodEnv derive all the env required for the helper pod
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envVar []apiv1.EnvVar
ENVList := map[string]string{
"APP_NS": experimentsDetails.AppNS,
"APP_POD": podName,
"APP_CONTAINER": experimentsDetails.TargetContainer,
"TOTAL_CHAOS_DURATION": strconv.Itoa(experimentsDetails.ChaosDuration),
"CHAOS_NAMESPACE": experimentsDetails.ChaosNamespace,
"CHAOS_ENGINE": experimentsDetails.EngineName,
"CHAOS_UID": string(experimentsDetails.ChaosUID),
"EXPERIMENT_NAME": experimentsDetails.ExperimentName,
"FILL_PERCENTAGE": strconv.Itoa(experimentsDetails.FillPercentage),
}
for key, value := range ENVList {
var perEnv apiv1.EnvVar
perEnv.Name = key
perEnv.Value = value
envVar = append(envVar, perEnv)
}
// Getting experiment pod name from downward API
experimentPodName := GetValueFromDownwardAPI("v1", "metadata.name")
var downwardEnv apiv1.EnvVar
downwardEnv.Name = "POD_NAME"
downwardEnv.ValueFrom = &experimentPodName
envVar = append(envVar, downwardEnv)
var envDetails common.ENVDetails
envDetails.SetEnv("TARGETS", targets).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
SetEnv("CHAOS_UID", string(experimentsDetails.ChaosUID)).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("FILL_PERCENTAGE", experimentsDetails.FillPercentage).
SetEnv("EPHEMERAL_STORAGE_MEBIBYTES", experimentsDetails.EphemeralStorageMebibytes).
SetEnv("DATA_BLOCK_SIZE", strconv.Itoa(experimentsDetails.DataBlockSize)).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envVar
return envDetails.ENV
}
// GetValueFromDownwardAPI returns the value from downwardApi
func GetValueFromDownwardAPI(apiVersion string, fieldPath string) apiv1.EnvVarSource {
downwardENV := apiv1.EnvVarSource{
FieldRef: &apiv1.ObjectFieldSelector{
APIVersion: apiVersion,
FieldPath: fieldPath,
},
}
return downwardENV
// setChaosTunables will setup a random value within a given range of values
// If the value is not provided in range it'll setup the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.FillPercentage = common.ValidateRange(experimentsDetails.FillPercentage)
experimentsDetails.EphemeralStorageMebibytes = common.ValidateRange(experimentsDetails.EphemeralStorageMebibytes)
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -0,0 +1,180 @@
package lib
import (
"context"
"fmt"
"strconv"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/docker-service-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareDockerServiceKill contains prepration steps before chaos injection
func PrepareDockerServiceKill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareDockerServiceKillFault")
defer span.End()
var err error
if experimentsDetails.TargetNode == "" {
//Select node for docker-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get node name")
}
}
log.InfoWithValues("[Info]: Details of node under chaos injection", logrus.Fields{
"NodeName": experimentsDetails.TargetNode,
})
experimentsDetails.RunID = stringutils.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + experimentsDetails.TargetNode + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
// Creating the helper pod to perform docker-service-kill
if err = createHelperPod(ctx, experimentsDetails, clients, chaosDetails, experimentsDetails.TargetNode); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateDockerServiceKillFaultHelperPod")
defer span.End()
privileged := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
Volumes: []apiv1.Volume{
{
Name: "bus",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/var/run",
},
},
},
{
Name: "root",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/",
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
},
Args: []string{
"-c",
"sleep 10 && systemctl stop docker && sleep " + strconv.Itoa(experimentsDetails.ChaosDuration) + " && systemctl start docker",
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "bus",
MountPath: "/var/run",
},
{
Name: "root",
MountPath: "/node",
},
},
SecurityContext: &apiv1.SecurityContext{
Privileged: &privileged,
},
TTY: true,
},
},
Tolerations: []apiv1.Toleration{
{
Key: "node.kubernetes.io/not-ready",
Operator: apiv1.TolerationOperator("Exists"),
Effect: apiv1.TaintEffect("NoExecute"),
TolerationSeconds: ptrint64(int64(experimentsDetails.ChaosDuration) + 60),
},
{
Key: "node.kubernetes.io/unreachable",
Operator: apiv1.TolerationOperator("Exists"),
Effect: apiv1.TaintEffect("NoExecute"),
TolerationSeconds: ptrint64(int64(experimentsDetails.ChaosDuration) + 60),
},
},
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
func ptrint64(p int64) *int64 {
return &p
}

View File

@ -0,0 +1,83 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
ebsloss "github.com/litmuschaos/litmus-go/chaoslib/litmus/ebs-loss/lib"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareEBSLossByID contains the prepration and injection steps for the experiment
func PrepareEBSLossByID(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEBSLossFaultByID")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//get the volume id or list of instance ids
volumeIDList := strings.Split(experimentsDetails.EBSVolumeID, ",")
if len(volumeIDList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no volume id found to detach"}
}
// watching for the abort signal and revert the chaos
go ebsloss.AbortWatcher(experimentsDetails, volumeIDList, abort, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = ebsloss.InjectChaosInSerialMode(ctx, experimentsDetails, volumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = ebsloss.InjectChaosInParallelMode(ctx, experimentsDetails, volumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
}
return nil
}

View File

@ -0,0 +1,80 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
ebsloss "github.com/litmuschaos/litmus-go/chaoslib/litmus/ebs-loss/lib"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareEBSLossByTag contains the prepration and injection steps for the experiment
func PrepareEBSLossByTag(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEBSLossFaultByTag")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
targetEBSVolumeIDList := common.FilterBasedOnPercentage(experimentsDetails.VolumeAffectedPerc, experimentsDetails.TargetVolumeIDList)
log.Infof("[Chaos]:Number of volumes targeted: %v", len(targetEBSVolumeIDList))
// watching for the abort signal and revert the chaos
go ebsloss.AbortWatcher(experimentsDetails, targetEBSVolumeIDList, abort, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = ebsloss.InjectChaosInSerialMode(ctx, experimentsDetails, targetEBSVolumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = ebsloss.InjectChaosInParallelMode(ctx, experimentsDetails, targetEBSVolumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
}
return nil
}

View File

@ -1,207 +1,239 @@
package lib
import (
"context"
"fmt"
"os"
"time"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/awserr"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/ec2"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
ebs "github.com/litmuschaos/litmus-go/pkg/cloud/aws"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
ebs "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ebs"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
//InjectEBSLoss contains the chaos injection steps for ebs loss
func InjectEBSLoss(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// InjectChaosInSerialMode will inject the ebs loss chaos in serial mode which means one after other
func InjectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetEBSVolumeIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEBSLossFaultInSerialMode")
defer span.End()
var err error
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on ec2 instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i, volumeID := range targetEBSVolumeIDList {
//Get volume attachment details
ec2InstanceID, device, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region)
if err != nil {
return stacktrace.Propagate(err, "failed to get the attachment info")
}
//Detaching the ebs volume from the instance
log.Info("[Chaos]: Detaching the EBS volume from the instance")
if err = ebs.EBSVolumeDetach(volumeID, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ebs detachment failed")
}
common.SetTargets(volumeID, "injected", "EBS", chaosDetails)
//Wait for ebs volume detachment
log.Infof("[Wait]: Wait for EBS volume detachment for volume %v", volumeID)
if err = ebs.WaitForVolumeDetachment(volumeID, ec2InstanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "ebs detachment failed")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos duration
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
//Getting the EBS volume attachment status
ebsState, err := ebs.GetEBSStatus(volumeID, ec2InstanceID, experimentsDetails.Region)
if err != nil {
return stacktrace.Propagate(err, "failed to get the ebs status")
}
switch ebsState {
case "attached":
log.Info("[Skip]: The EBS volume is already attached")
default:
//Attaching the ebs volume from the instance
log.Info("[Chaos]: Attaching the EBS volume back to the instance")
if err = ebs.EBSVolumeAttach(volumeID, ec2InstanceID, device, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ebs attachment failed")
}
//Wait for ebs volume attachment
log.Infof("[Wait]: Wait for EBS volume attachment for %v volume", volumeID)
if err = ebs.WaitForVolumeAttachment(volumeID, ec2InstanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "ebs attachment failed")
}
}
common.SetTargets(volumeID, "reverted", "EBS", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
//Detaching the ebs volume from the instance
log.Info("[Chaos]: Detaching the EBS volume from the instance")
err = EBSVolumeDetach(experimentsDetails)
if err != nil {
return errors.Errorf("ebs detachment failed, err: %v", err)
// InjectChaosInParallelMode will inject the chaos in parallel mode that means all at once
func InjectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetEBSVolumeIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEBSLossFaultInParallelMode")
defer span.End()
var ec2InstanceIDList, deviceList []string
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on ec2 instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//prepare the instaceIDs and device name for all the given volume
for _, volumeID := range targetEBSVolumeIDList {
ec2InstanceID, device, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region)
if err != nil {
return stacktrace.Propagate(err, "failed to get the attachment info")
}
if ec2InstanceID == "" || device == "" {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: "Volume not attached to any instance",
Target: fmt.Sprintf("EBS Volume ID: %v", volumeID),
}
}
ec2InstanceIDList = append(ec2InstanceIDList, ec2InstanceID)
deviceList = append(deviceList, device)
}
for _, volumeID := range targetEBSVolumeIDList {
//Detaching the ebs volume from the instance
log.Info("[Chaos]: Detaching the EBS volume from the instance")
if err := ebs.EBSVolumeDetach(volumeID, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ebs detachment failed")
}
common.SetTargets(volumeID, "injected", "EBS", chaosDetails)
}
log.Info("[Info]: Checking if the detachment process initiated")
if err := ebs.CheckEBSDetachmentInitialisation(targetEBSVolumeIDList, ec2InstanceIDList, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "failed to initialise the detachment")
}
for i, volumeID := range targetEBSVolumeIDList {
//Wait for ebs volume detachment
log.Infof("[Wait]: Wait for EBS volume detachment for volume %v", volumeID)
if err := ebs.WaitForVolumeDetachment(volumeID, ec2InstanceIDList[i], experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "ebs detachment failed")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
for i, volumeID := range targetEBSVolumeIDList {
//Getting the EBS volume attachment status
ebsState, err := ebs.GetEBSStatus(volumeID, ec2InstanceIDList[i], experimentsDetails.Region)
if err != nil {
return stacktrace.Propagate(err, "failed to get the ebs status")
}
switch ebsState {
case "attached":
log.Info("[Skip]: The EBS volume is already attached")
default:
//Attaching the ebs volume from the instance
log.Info("[Chaos]: Attaching the EBS volume from the instance")
if err = ebs.EBSVolumeAttach(volumeID, ec2InstanceIDList[i], deviceList[i], experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ebs attachment failed")
}
//Wait for ebs volume attachment
log.Infof("[Wait]: Wait for EBS volume attachment for volume %v", volumeID)
if err = ebs.WaitForVolumeAttachment(volumeID, ec2InstanceIDList[i], experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "ebs attachment failed")
}
}
common.SetTargets(volumeID, "reverted", "EBS", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
//Wait for ebs volume detachment
log.Info("[Wait]: Wait for EBS volume detachment")
if err = WaitForVolumeDetachment(experimentsDetails); err != nil {
return errors.Errorf("unable to detach the ebs volume to the ec2 instance, err: %v", err)
}
// AbortWatcher will watching for the abort signal and revert the chaos
func AbortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, volumeIDList []string, abort chan os.Signal, chaosDetails *types.ChaosDetails) {
//Wait for chaos duration
log.Infof("[Wait]: Waiting for the chaos duration of %vs", experimentsDetails.ChaosDuration)
time.Sleep(time.Duration(experimentsDetails.ChaosDuration) * time.Second)
<-abort
//Getting the EBS volume attachment status
EBSStatus, err := ebs.GetEBSStatus(experimentsDetails)
if err != nil {
return errors.Errorf("failed to get the ebs status, err: %v", err)
}
if EBSStatus != "attached" {
//Attaching the ebs volume from the instance
log.Info("[Chaos]: Attaching the EBS volume from the instance")
err = EBSVolumeAttach(experimentsDetails)
log.Info("[Abort]: Chaos Revert Started")
for _, volumeID := range volumeIDList {
//Get volume attachment details
instanceID, deviceName, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region)
if err != nil {
return errors.Errorf("ebs attachment failed, err: %v", err)
log.Errorf("Failed to get the attachment info: %v", err)
}
//Wait for ebs volume attachment
log.Info("[Wait]: Wait for EBS volume attachment")
if err = WaitForVolumeAttachment(experimentsDetails); err != nil {
return errors.Errorf("unable to attach the ebs volume to the ec2 instance, err: %v", err)
//Getting the EBS volume attachment status
ebsState, err := ebs.GetEBSStatus(experimentsDetails.EBSVolumeID, instanceID, experimentsDetails.Region)
if err != nil {
log.Errorf("Failed to get the ebs status when an abort signal is received: %v", err)
}
} else {
log.Info("[Skip]: The EBS volume is already attached")
}
if ebsState != "attached" {
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// EBSVolumeDetach will detach the ebs vol from ec2 node
func EBSVolumeDetach(experimentsDetails *experimentTypes.ExperimentDetails) error {
// Load session from shared config
sess := session.Must(session.NewSessionWithOptions(session.Options{
SharedConfigState: session.SharedConfigEnable,
Config: aws.Config{Region: aws.String(experimentsDetails.Region)},
}))
// Create new EC2 client
ec2Svc := ec2.New(sess)
input := &ec2.DetachVolumeInput{
VolumeId: aws.String(experimentsDetails.EBSVolumeID),
}
result, err := ec2Svc.DetachVolume(input)
if err != nil {
if aerr, ok := err.(awserr.Error); ok {
switch aerr.Code() {
default:
return errors.Errorf(aerr.Error())
//Wait for ebs volume detachment
//We first wait for the volume to get in detached state then we are attaching it.
log.Info("[Abort]: Wait for EBS complete volume detachment")
if err = ebs.WaitForVolumeDetachment(experimentsDetails.EBSVolumeID, instanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
log.Errorf("Unable to detach the ebs volume: %v", err)
}
} else {
return errors.Errorf(err.Error())
}
}
log.InfoWithValues("Detaching ebs having:", logrus.Fields{
"VolumeId": *result.VolumeId,
"State": *result.State,
"Device": *result.Device,
"InstanceId": *result.InstanceId,
})
return nil
}
// EBSVolumeAttach will detach the ebs vol from ec2 node
func EBSVolumeAttach(experimentsDetails *experimentTypes.ExperimentDetails) error {
// Load session from shared config
sess := session.Must(session.NewSessionWithOptions(session.Options{
SharedConfigState: session.SharedConfigEnable,
Config: aws.Config{Region: aws.String(experimentsDetails.Region)},
}))
// Create new EC2 client
ec2Svc := ec2.New(sess)
//Attaching the ebs volume after chaos
input := &ec2.AttachVolumeInput{
Device: aws.String(experimentsDetails.DeviceName),
InstanceId: aws.String(experimentsDetails.Ec2InstanceID),
VolumeId: aws.String(experimentsDetails.EBSVolumeID),
}
result, err := ec2Svc.AttachVolume(input)
if err != nil {
if aerr, ok := err.(awserr.Error); ok {
switch aerr.Code() {
default:
return errors.Errorf(aerr.Error())
}
} else {
return errors.Errorf(err.Error())
}
}
log.InfoWithValues("Attaching ebs having:", logrus.Fields{
"VolumeId": *result.VolumeId,
"State": *result.State,
"Device": *result.Device,
"InstanceId": *result.InstanceId,
})
return nil
}
// WaitForVolumeDetachment will wait the ebs volume to completely detach
func WaitForVolumeDetachment(experimentsDetails *experimentTypes.ExperimentDetails) error {
log.Info("[Status]: Checking ebs volume status for detachment")
err := retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
instanceState, err := ebs.GetEBSStatus(experimentsDetails)
//Attaching the ebs volume from the instance
log.Info("[Chaos]: Attaching the EBS volume from the instance")
err = ebs.EBSVolumeAttach(experimentsDetails.EBSVolumeID, instanceID, deviceName, experimentsDetails.Region)
if err != nil {
return errors.Errorf("failed to get the instance status")
log.Errorf("EBS attachment failed when an abort signal is received: %v", err)
}
if instanceState != "detached" {
log.Infof("The instance state is %v", instanceState)
return errors.Errorf("instance is not yet in detached state")
}
log.Infof("The instance state is %v", instanceState)
return nil
})
if err != nil {
return err
}
common.SetTargets(volumeID, "reverted", "EBS", chaosDetails)
}
return nil
}
// WaitForVolumeAttachment will wait for the ebs volume to get attached on ec2 instance
func WaitForVolumeAttachment(experimentsDetails *experimentTypes.ExperimentDetails) error {
log.Info("[Status]: Checking ebs volume status for attachment")
err := retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
instanceState, err := ebs.GetEBSStatus(experimentsDetails)
if err != nil {
return errors.Errorf("failed to get the instance status")
}
if instanceState != "attached" {
log.Infof("The instance state is %v", instanceState)
return errors.Errorf("instance is not yet in attached state")
}
log.Infof("The instance state is %v", instanceState)
return nil
})
if err != nil {
return err
}
return nil
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -0,0 +1,265 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ec2-terminate-by-id/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareEC2TerminateByID contains the prepration and injection steps for the experiment
func PrepareEC2TerminateByID(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEC2TerminateFaultByID")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//get the instance id or list of instance ids
instanceIDList := strings.Split(experimentsDetails.Ec2InstanceID, ",")
if experimentsDetails.Ec2InstanceID == "" || len(instanceIDList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no EC2 instance ID found to terminate"}
}
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, instanceIDList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the ec2 instance termination in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByIDInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceID list, %v", instanceIDList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on ec2 instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//PowerOff the instance
for i, id := range instanceIDList {
//Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "injected", "EC2", chaosDetails)
//Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
//Starting the EC2 instance
if experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
//Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
common.SetTargets(id, "reverted", "EC2", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode will inject the ec2 instance termination in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByIDInParallelMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceID list, %v", instanceIDList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on ec2 instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//PowerOff the instance
for _, id := range instanceIDList {
//Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "injected", "EC2", chaosDetails)
}
for _, id := range instanceIDList {
//Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "reverted", "EC2 Instance ID", chaosDetails)
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
//Starting the EC2 instance
if experimentsDetails.ManagedNodegroup != "enable" {
for _, id := range instanceIDList {
log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
for _, id := range instanceIDList {
//Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
}
for _, id := range instanceIDList {
common.SetTargets(id, "reverted", "EC2", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// watching for the abort signal and revert the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for _, id := range instanceIDList {
instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region)
if err != nil {
log.Errorf("Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "running" && experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Abort]: Waiting for the EC2 instance to get down")
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
log.Errorf("Unable to wait till stop of the instance: %v", err)
}
log.Info("[Abort]: Starting EC2 instance as abort signal received")
err := awslib.EC2Start(id, experimentsDetails.Region)
if err != nil {
log.Errorf("EC2 instance failed to start when an abort signal is received: %v", err)
}
}
common.SetTargets(id, "reverted", "EC2", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -0,0 +1,296 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ec2-terminate-by-tag/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
var inject, abort chan os.Signal
// PrepareEC2TerminateByTag contains the prepration and injection steps for the experiment
func PrepareEC2TerminateByTag(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEC2TerminateFaultByTag")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
instanceIDList := common.FilterBasedOnPercentage(experimentsDetails.InstanceAffectedPerc, experimentsDetails.TargetInstanceIDList)
log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceIDList))
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, instanceIDList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the ce2 instance termination in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByTagInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceID list, %v", instanceIDList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on ec2 instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//PowerOff the instance
for i, id := range instanceIDList {
//Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "injected", "EC2", chaosDetails)
//Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
//Starting the EC2 instance
if experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
//Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
common.SetTargets(id, "reverted", "EC2", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode will inject the ce2 instance termination in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByTagInParallelMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instanceID list, %v", instanceIDList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on ec2 instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//PowerOff the instance
for _, id := range instanceIDList {
//Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "injected", "EC2", chaosDetails)
}
for _, id := range instanceIDList {
//Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
//Starting the EC2 instance
if experimentsDetails.ManagedNodegroup != "enable" {
for _, id := range instanceIDList {
log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
for _, id := range instanceIDList {
//Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
}
for _, id := range instanceIDList {
common.SetTargets(id, "reverted", "EC2", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// SetTargetInstance will select the target instance which are in running state and filtered from the given instance tag
func SetTargetInstance(experimentsDetails *experimentTypes.ExperimentDetails) error {
instanceIDList, err := awslib.GetInstanceList(experimentsDetails.Ec2InstanceTag, experimentsDetails.Region)
if err != nil {
return stacktrace.Propagate(err, "failed to get the instance id list")
}
if len(instanceIDList) == 0 {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeTargetSelection,
Reason: fmt.Sprintf("no instance found with the given tag %v, in region %v", experimentsDetails.Ec2InstanceTag, experimentsDetails.Region),
}
}
for _, id := range instanceIDList {
instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region)
if err != nil {
return stacktrace.Propagate(err, "failed to get the instance status while selecting the target instances")
}
if instanceState == "running" {
experimentsDetails.TargetInstanceIDList = append(experimentsDetails.TargetInstanceIDList, id)
}
}
if len(experimentsDetails.TargetInstanceIDList) == 0 {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: "failed to get any running instance",
Target: fmt.Sprintf("EC2 Instance Tag: %v", experimentsDetails.Ec2InstanceTag)}
}
log.InfoWithValues("[Info]: Targeting the running instances filtered from instance tag", logrus.Fields{
"Total number of instances filtered": len(instanceIDList),
"Number of running instances filtered": len(experimentsDetails.TargetInstanceIDList),
})
return nil
}
// watching for the abort signal and revert the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for _, id := range instanceIDList {
instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region)
if err != nil {
log.Errorf("Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "running" && experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Abort]: Waiting for the EC2 instance to get down")
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
log.Errorf("Unable to wait till stop of the instance: %v", err)
}
log.Info("[Abort]: Starting EC2 instance as abort signal received")
err := awslib.EC2Start(id, experimentsDetails.Region)
if err != nil {
log.Errorf("EC2 instance failed to start when an abort signal is received: %v", err)
}
}
common.SetTargets(id, "reverted", "EC2", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -1,196 +0,0 @@
package lib
import (
"time"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/awserr"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/ec2"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ec2-terminate/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
)
//InjectEC2Terminate contains the chaos injection steps for ec2 terminate chaos
func InjectEC2Terminate(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
var err error
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Stoping the EC2 instance
log.Info("[Chaos]: Stoping the desired EC2 instance")
err = EC2Stop(experimentsDetails)
if err != nil {
return errors.Errorf("ec2 instance failed to stop, err: %v", err)
}
//Wait for ec2 instance to completely stop
log.Info("[Wait]: Wait for EC2 instance to come in stopped state")
if err = WaitForEC2Down(experimentsDetails); err != nil {
return errors.Errorf("unable to stop the ec2 instance, err: %v", err)
}
//Wait for chaos duration
log.Infof("[Wait]: Waiting for chaos duration of %vs before starting the instance", experimentsDetails.ChaosDuration)
time.Sleep(time.Duration(experimentsDetails.ChaosDuration) * time.Second)
//Starting the EC2 instance
log.Info("[Chaos]: Starting back the EC2 instance")
err = EC2Start(experimentsDetails)
if err != nil {
return errors.Errorf("ec2 instance failed to start, err: %v", err)
}
//Wait for ec2 instance to come in running state
log.Info("[Wait]: Wait for EC2 instance to get in running state")
if err = WaitForEC2Up(experimentsDetails); err != nil {
return errors.Errorf("unable to start the ec2 instance, err: %v", err)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// EC2Stop will stop an aws ec2 instance
func EC2Stop(experimentsDetails *experimentTypes.ExperimentDetails) error {
// Load session from shared config
sess := session.Must(session.NewSessionWithOptions(session.Options{
SharedConfigState: session.SharedConfigEnable,
Config: aws.Config{Region: aws.String(experimentsDetails.Region)},
}))
// Create new EC2 client
ec2Svc := ec2.New(sess)
input := &ec2.StopInstancesInput{
InstanceIds: []*string{
aws.String(experimentsDetails.Ec2InstanceID),
},
}
result, err := ec2Svc.StopInstances(input)
if err != nil {
if aerr, ok := err.(awserr.Error); ok {
switch aerr.Code() {
default:
return errors.Errorf(aerr.Error())
}
} else {
return errors.Errorf(err.Error())
}
}
log.InfoWithValues("Stopping an ec2 instance:", logrus.Fields{
"CurrentState": *result.StoppingInstances[0].CurrentState.Name,
"PreviousState": *result.StoppingInstances[0].PreviousState.Name,
"InstanceId": *result.StoppingInstances[0].InstanceId,
})
return nil
}
// EC2Start will stop an aws ec2 instance
func EC2Start(experimentsDetails *experimentTypes.ExperimentDetails) error {
// Load session from shared config
sess := session.Must(session.NewSessionWithOptions(session.Options{
SharedConfigState: session.SharedConfigEnable,
Config: aws.Config{Region: aws.String(experimentsDetails.Region)},
}))
// Create new EC2 client
ec2Svc := ec2.New(sess)
input := &ec2.StartInstancesInput{
InstanceIds: []*string{
aws.String(experimentsDetails.Ec2InstanceID),
},
}
result, err := ec2Svc.StartInstances(input)
if err != nil {
if aerr, ok := err.(awserr.Error); ok {
switch aerr.Code() {
default:
return errors.Errorf(aerr.Error())
}
} else {
return errors.Errorf(err.Error())
}
}
log.InfoWithValues("Starting ec2 instance:", logrus.Fields{
"CurrentState": *result.StartingInstances[0].CurrentState.Name,
"PreviousState": *result.StartingInstances[0].PreviousState.Name,
"InstanceId": *result.StartingInstances[0].InstanceId,
})
return nil
}
//WaitForEC2Down will wait for the ec2 instance to get in stopped state
func WaitForEC2Down(experimentsDetails *experimentTypes.ExperimentDetails) error {
log.Info("[Status]: Checking EC2 instance status")
err := retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
instanceState, err := awslib.GetEC2InstanceStatus(experimentsDetails)
if err != nil {
return errors.Errorf("failed to get the instance status")
}
if instanceState != "stopped" {
log.Infof("The instance state is %v", instanceState)
return errors.Errorf("instance is not yet in stopped state")
}
log.Infof("The instance state is %v", instanceState)
return nil
})
if err != nil {
return err
}
return nil
}
//WaitForEC2Up will wait for the ec2 instance to get in running state
func WaitForEC2Up(experimentsDetails *experimentTypes.ExperimentDetails) error {
log.Info("[Status]: Checking EC2 instance status")
err := retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
instanceState, err := awslib.GetEC2InstanceStatus(experimentsDetails)
if err != nil {
return errors.Errorf("failed to get the instance status")
}
if instanceState != "running" {
log.Infof("The instance state is %v", instanceState)
return errors.Errorf("instance is not yet in running state")
}
log.Infof("The instance state is %v", instanceState)
return nil
})
if err != nil {
return err
}
return nil
}

View File

@ -0,0 +1,312 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-disk-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareDiskVolumeLossByLabel contains the prepration and injection steps for the experiment
func PrepareDiskVolumeLossByLabel(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareGCPDiskVolumeLossFaultByLabel")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
diskVolumeNamesList := common.FilterBasedOnPercentage(experimentsDetails.DiskAffectedPerc, experimentsDetails.TargetDiskVolumeNamesList)
if err := getDeviceNamesAndVMInstanceNames(diskVolumeNamesList, computeService, experimentsDetails); err != nil {
return err
}
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// watching for the abort signal and revert the chaos
go abortWatcher(computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, abort, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the disk loss chaos in serial mode which means one after the other
func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPDiskVolumeLossFaultByLabelInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on VM instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i := range targetDiskVolumeNamesList {
//Detaching the disk volume from the instance
log.Info("[Chaos]: Detaching the disk volume from the instance")
if err = gcp.DiskVolumeDetach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk detachment failed")
}
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
//Wait for disk volume detachment
log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to detach the disk volume from the vm instance")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
//Wait for chaos duration
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone)
if err != nil {
return stacktrace.Propagate(err, "failed to get the disk volume status")
}
switch diskState {
case "attached":
log.Info("[Skip]: The disk volume is already attached")
default:
//Attaching the disk volume to the instance
log.Info("[Chaos]: Attaching the disk volume back to the instance")
if err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk attachment failed")
}
//Wait for disk volume attachment
log.Infof("[Wait]: Wait for disk volume attachment for %v volume", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to attach the disk volume to the vm instance")
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
// injectChaosInParallelMode will inject the disk loss chaos in parallel mode that means all at once
func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPDiskVolumeLossFaultByLabelInParallelMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on vm instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i := range targetDiskVolumeNamesList {
//Detaching the disk volume from the instance
log.Info("[Chaos]: Detaching the disk volume from the instance")
if err = gcp.DiskVolumeDetach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk detachment failed")
}
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
}
for i := range targetDiskVolumeNamesList {
//Wait for disk volume detachment
log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to detach the disk volume from the vm instance")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
for i := range targetDiskVolumeNamesList {
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone)
if err != nil {
return stacktrace.Propagate(err, "failed to get the disk status")
}
switch diskState {
case "attached":
log.Info("[Skip]: The disk volume is already attached")
default:
//Attaching the disk volume to the instance
log.Info("[Chaos]: Attaching the disk volume to the instance")
if err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk attachment failed")
}
//Wait for disk volume attachment
log.Infof("[Wait]: Wait for disk volume attachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to attach the disk volume to the vm instance")
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
// AbortWatcher will watching for the abort signal and revert the chaos
func abortWatcher(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, abort chan os.Signal, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for i := range targetDiskVolumeNamesList {
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone)
if err != nil {
log.Errorf("Failed to get %s disk state when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
}
if diskState != "attached" {
//Wait for disk volume detachment
//We first wait for the volume to get in detached state then we are attaching it.
log.Infof("[Abort]: Wait for %s complete disk volume detachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
log.Errorf("Unable to detach %s disk volume, err: %v", targetDiskVolumeNamesList[i], err)
}
//Attaching the disk volume from the instance
log.Infof("[Chaos]: Attaching %s disk volume to the instance", targetDiskVolumeNamesList[i])
err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i])
if err != nil {
log.Errorf("%s disk attachment failed when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}
// getDeviceNamesAndVMInstanceNames fetches the device name and attached VM instance name for each target disk
func getDeviceNamesAndVMInstanceNames(diskVolumeNamesList []string, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails) error {
for i := range diskVolumeNamesList {
instanceName, err := gcp.GetVolumeAttachmentDetails(computeService, experimentsDetails.GCPProjectID, experimentsDetails.Zones, diskVolumeNamesList[i])
if err != nil || instanceName == "" {
return stacktrace.Propagate(err, "failed to get the disk attachment info")
}
deviceName, err := gcp.GetDiskDeviceNameForVM(computeService, diskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones, instanceName)
if err != nil {
return stacktrace.Propagate(err, "failed to fetch the disk device name")
}
experimentsDetails.TargetDiskInstanceNamesList = append(experimentsDetails.TargetDiskInstanceNamesList, instanceName)
experimentsDetails.DeviceNamesList = append(experimentsDetails.DeviceNamesList, deviceName)
}
return nil
}

View File

@ -0,0 +1,303 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-disk-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"github.com/pkg/errors"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareDiskVolumeLoss contains the prepration and injection steps for the experiment
func PrepareDiskVolumeLoss(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareVMDiskLossFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//get the disk volume names list
diskNamesList := strings.Split(experimentsDetails.DiskVolumeNames, ",")
//get the disk zones list
diskZonesList := strings.Split(experimentsDetails.Zones, ",")
//get the device names for the given disks
if err := getDeviceNamesList(computeService, experimentsDetails, diskNamesList, diskZonesList); err != nil {
return stacktrace.Propagate(err, "failed to fetch the disk device names")
}
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// watching for the abort signal and revert the chaos
go abortWatcher(computeService, experimentsDetails, diskNamesList, diskZonesList, abort, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, computeService, experimentsDetails, diskNamesList, diskZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, computeService, experimentsDetails, diskNamesList, diskZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the disk loss chaos in serial mode which means one after the other
func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMDiskLossFaultInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on VM instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i := range targetDiskVolumeNamesList {
//Detaching the disk volume from the instance
log.Infof("[Chaos]: Detaching %s disk volume from the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeDetach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk detachment failed")
}
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
//Wait for disk volume detachment
log.Infof("[Wait]: Wait for %s disk volume detachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to detach disk volume from the vm instance")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
//Wait for chaos duration
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i])
if err != nil {
return stacktrace.Propagate(err, fmt.Sprintf("failed to get %s disk volume status", targetDiskVolumeNamesList[i]))
}
switch diskState {
case "attached":
log.Infof("[Skip]: %s disk volume is already attached", targetDiskVolumeNamesList[i])
default:
//Attaching the disk volume to the instance
log.Infof("[Chaos]: Attaching %s disk volume back to the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk attachment failed")
}
//Wait for disk volume attachment
log.Infof("[Wait]: Wait for %s disk volume attachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to attach disk volume to the vm instance")
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
// injectChaosInParallelMode will inject the disk loss chaos in parallel mode that means all at once
func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMDiskLossFaultInParallelMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on vm instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i := range targetDiskVolumeNamesList {
//Detaching the disk volume from the instance
log.Infof("[Chaos]: Detaching %s disk volume from the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeDetach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk detachment failed")
}
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
}
for i := range targetDiskVolumeNamesList {
//Wait for disk volume detachment
log.Infof("[Wait]: Wait for %s disk volume detachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to detach disk volume from the vm instance")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for the chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
for i := range targetDiskVolumeNamesList {
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i])
if err != nil {
return errors.Errorf("failed to get the disk status, err: %v", err)
}
switch diskState {
case "attached":
log.Infof("[Skip]: %s disk volume is already attached", targetDiskVolumeNamesList[i])
default:
//Attaching the disk volume to the instance
log.Infof("[Chaos]: Attaching %s disk volume to the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return stacktrace.Propagate(err, "disk attachment failed")
}
//Wait for disk volume attachment
log.Infof("[Wait]: Wait for %s disk volume attachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return stacktrace.Propagate(err, "unable to attach disk volume to the vm instance")
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
return nil
}
// AbortWatcher will watching for the abort signal and revert the chaos
func abortWatcher(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, abort chan os.Signal, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for i := range targetDiskVolumeNamesList {
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i])
if err != nil {
log.Errorf("Failed to get %s disk state when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
}
if diskState != "attached" {
//Wait for disk volume detachment
//We first wait for the volume to get in detached state then we are attaching it.
log.Infof("[Abort]: Wait for complete disk volume detachment for %s", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
log.Errorf("Unable to detach %s disk volume, err: %v", targetDiskVolumeNamesList[i], err)
}
//Attaching the disk volume from the instance
log.Infof("[Chaos]: Attaching %s disk volume from the instance", targetDiskVolumeNamesList[i])
err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i])
if err != nil {
log.Errorf("%s disk attachment failed when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}
// getDeviceNamesList fetches the device names for the target disks
func getDeviceNamesList(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, diskNamesList, diskZonesList []string) error {
for i := range diskNamesList {
deviceName, err := gcp.GetDiskDeviceNameForVM(computeService, diskNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.TargetDiskInstanceNamesList[i])
if err != nil {
return err
}
experimentsDetails.DeviceNamesList = append(experimentsDetails.DeviceNamesList, deviceName)
}
return nil
}

View File

@ -0,0 +1,293 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
gcplib "github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1"
)
var inject, abort chan os.Signal
// PrepareVMStopByLabel executes the experiment steps by injecting chaos into target VM instances
func PrepareVMStopByLabel(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareGCPVMInstanceStopFaultByLabel")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
instanceNamesList := common.FilterBasedOnPercentage(experimentsDetails.InstanceAffectedPerc, experimentsDetails.TargetVMInstanceNameList)
log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceNamesList))
// watching for the abort signal and revert the chaos
go abortWatcher(computeService, experimentsDetails, instanceNamesList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(ctx, computeService, experimentsDetails, instanceNamesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(ctx, computeService, experimentsDetails, instanceNamesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stops VM instances in serial mode i.e. one after the other
func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPVMInstanceStopFaultByLabelInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target VM instance list, %v", instanceNamesList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Stop the instance
for i := range instanceNamesList {
//Stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "VM instance failed to stop")
}
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
//Wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to stop", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// wait for the chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
switch experimentsDetails.ManagedInstanceGroup {
case "enable":
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in RUNNING state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "unable to start %s vm instance")
}
default:
// starting the VM instance
log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "vm instance failed to start")
}
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in RUNNING state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "unable to start %s vm instance")
}
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode will inject the VM instance termination in serial mode that is one after other
func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPVMInstanceStopFaultByLabelInParallelMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target VM instance list, %v", instanceNamesList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// power-off the instance
for i := range instanceNamesList {
// stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "vm instance failed to stop")
}
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
}
for i := range instanceNamesList {
// wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
switch experimentsDetails.ManagedInstanceGroup {
case "enable":
// wait for VM instance to get in running state
for i := range instanceNamesList {
log.Infof("[Wait]: Wait for VM instance '%v' to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "unable to start the vm instance")
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
default:
// starting the VM instance
for i := range instanceNamesList {
log.Info("[Chaos]: Starting back the VM instance")
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "vm instance failed to start")
}
}
// wait for VM instance to get in running state
for i := range instanceNamesList {
log.Infof("[Wait]: Wait for VM instance '%v' to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return stacktrace.Propagate(err, "unable to start the vm instance")
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// abortWatcher watches for the abort signal and reverts the chaos
func abortWatcher(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for i := range instanceNamesList {
instanceState, err := gcplib.GetVMInstanceStatus(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones)
if err != nil {
log.Errorf("Failed to get %s instance status when an abort signal is received, err: %v", instanceNamesList[i], err)
}
if instanceState != "RUNNING" && experimentsDetails.ManagedInstanceGroup != "enable" {
log.Info("[Abort]: Waiting for the VM instance to shut down")
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
log.Errorf("Unable to wait till stop of %s instance, err: %v", instanceNamesList[i], err)
}
log.Info("[Abort]: Starting VM instance as abort signal received")
err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones)
if err != nil {
log.Errorf("%s instance failed to start when an abort signal is received, err: %v", instanceNamesList[i], err)
}
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -0,0 +1,304 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
gcplib "github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1"
)
var (
err error
inject, abort chan os.Signal
)
// PrepareVMStop contains the prepration and injection steps for the experiment
func PrepareVMStop(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareVMInstanceStopFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// get the instance name or list of instance names
instanceNamesList := strings.Split(experimentsDetails.VMInstanceName, ",")
// get the zone name or list of corresponding zones for the instances
instanceZonesList := strings.Split(experimentsDetails.Zones, ",")
go abortWatcher(computeService, experimentsDetails, instanceNamesList, instanceZonesList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, computeService, experimentsDetails, instanceNamesList, instanceZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, computeService, experimentsDetails, instanceNamesList, instanceZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// wait for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stops VM instances in serial mode i.e. one after the other
func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, instanceZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMInstanceStopFaultInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instance list, %v", instanceNamesList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Stop the instance
for i := range instanceNamesList {
//Stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to stop")
}
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
//Wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// wait for the chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
switch experimentsDetails.ManagedInstanceGroup {
case "disable":
// starting the VM instance
log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to start")
}
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "unable to start vm instance")
}
default:
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "unable to start vm instance")
}
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode stops VM instances in parallel mode i.e. all at once
func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, instanceZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMInstanceStopFaultInParallelMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target VM instance list, %v", instanceNamesList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// power-off the instance
for i := range instanceNamesList {
// stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to stop")
}
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
}
for i := range instanceNamesList {
// wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
switch experimentsDetails.ManagedInstanceGroup {
case "disable":
// starting the VM instance
for i := range instanceNamesList {
log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "vm instance failed to start")
}
}
// wait for VM instance to get in running state
for i := range instanceNamesList {
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "unable to start vm instance")
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
default:
// wait for VM instance to get in running state
for i := range instanceNamesList {
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return stacktrace.Propagate(err, "unable to start vm instance")
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// abortWatcher watches for the abort signal and reverts the chaos
func abortWatcher(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, zonesList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
if experimentsDetails.ManagedInstanceGroup != "enable" {
for i := range instanceNamesList {
instanceState, err := gcplib.GetVMInstanceStatus(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i])
if err != nil {
log.Errorf("Failed to get %s vm instance status when an abort signal is received, err: %v", instanceNamesList[i], err)
}
if instanceState != "RUNNING" {
log.Infof("[Abort]: Waiting for %s VM instance to shut down", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i]); err != nil {
log.Errorf("Unable to wait till stop of %s instance, err: %v", instanceNamesList[i], err)
}
log.Infof("[Abort]: Starting %s VM instance as abort signal is received", instanceNamesList[i])
err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i])
if err != nil {
log.Errorf("%s VM instance failed to start when an abort signal is received, err: %v", instanceNamesList[i], err)
}
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
}
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -0,0 +1,332 @@
package helper
import (
"context"
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"os"
"os/signal"
"strconv"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
clientTypes "k8s.io/apimachinery/pkg/types"
)
var (
err error
inject, abort chan os.Signal
)
// Helper injects the http chaos
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodHTTPFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
resultDetails := types.ResultDetails{}
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Fetching all the ENV passed for the helper pod
log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails)
// Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Initialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
err := prepareK8sHttpChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails)
if err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
// prepareK8sHttpChaos contains the preparation steps before chaos injection
func prepareK8sHttpChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not parse targets")
}
var targets []targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
td.ContainerId, err = common.GetRuntimeBasedContainerID(experimentsDetails.ContainerRuntime, experimentsDetails.SocketPath, td.Name, td.Namespace, td.TargetContainer, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the pid of the target container
td.Pid, err = common.GetPauseAndSandboxPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
}
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos
go abortWatcher(targets, resultDetails.Name, chaosDetails.ChaosNamespace, experimentsDetails)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
for _, t := range targets {
// injecting http chaos inside target container
if err = injectChaos(experimentsDetails, t); err != nil {
return stacktrace.Propagate(err, "could not inject chaos")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertChaos(experimentsDetails, t); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: chaos duration is over, reverting chaos")
var errList []string
for _, t := range targets {
// cleaning the ip rules process after chaos injection
err := revertChaos(experimentsDetails, t)
if err != nil {
errList = append(errList, err.Error())
continue
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// injectChaos inject the http chaos in target container and add ruleset to the iptables to redirect the ports
func injectChaos(experimentDetails *experimentTypes.ExperimentDetails, t targetDetails) error {
if err := startProxy(experimentDetails, t.Pid); err != nil {
killErr := killProxy(t.Pid, t.Source)
if killErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(killErr).Error())}
}
return stacktrace.Propagate(err, "could not start proxy server")
}
if err := addIPRuleSet(experimentDetails, t.Pid); err != nil {
killErr := killProxy(t.Pid, t.Source)
if killErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(killErr).Error())}
}
return stacktrace.Propagate(err, "could not add ip rules")
}
return nil
}
// revertChaos revert the http chaos in target container
func revertChaos(experimentDetails *experimentTypes.ExperimentDetails, t targetDetails) error {
var errList []string
if err := removeIPRuleSet(experimentDetails, t.Pid); err != nil {
errList = append(errList, err.Error())
}
if err := killProxy(t.Pid, t.Source); err != nil {
errList = append(errList, err.Error())
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
return nil
}
// startProxy starts the proxy process inside the target container
// it is using nsenter command to enter into network namespace of target container
// and execute the proxy related command inside it.
func startProxy(experimentDetails *experimentTypes.ExperimentDetails, pid int) error {
toxics := os.Getenv("TOXIC_COMMAND")
// starting toxiproxy server inside the target container
startProxyServerCommand := fmt.Sprintf("(sudo nsenter -t %d -n toxiproxy-server -host=0.0.0.0 > /dev/null 2>&1 &)", pid)
// Creating a proxy for the targeted service in the target container
createProxyCommand := fmt.Sprintf("(sudo nsenter -t %d -n toxiproxy-cli create -l 0.0.0.0:%d -u 0.0.0.0:%d proxy)", pid, experimentDetails.ProxyPort, experimentDetails.TargetServicePort)
createToxicCommand := fmt.Sprintf("(sudo nsenter -t %d -n toxiproxy-cli toxic add %s --toxicity %f proxy)", pid, toxics, float32(experimentDetails.Toxicity)/100.0)
// sleep 2 is added for proxy-server to be ready for creating proxy and adding toxics
chaosCommand := fmt.Sprintf("%s && sleep 2 && %s && %s", startProxyServerCommand, createProxyCommand, createToxicCommand)
log.Infof("[Chaos]: Starting proxy server")
if err := common.RunBashCommand(chaosCommand, "failed to start proxy server", experimentDetails.ChaosPodName); err != nil {
return err
}
log.Info("[Info]: Proxy started successfully")
return nil
}
const NoProxyToKill = "you need to specify whom to kill"
// killProxy kills the proxy process inside the target container
// it is using nsenter command to enter into network namespace of target container
// and execute the proxy related command inside it.
func killProxy(pid int, source string) error {
stopProxyServerCommand := fmt.Sprintf("sudo nsenter -t %d -n sudo kill -9 $(ps aux | grep [t]oxiproxy | awk 'FNR==2{print $2}')", pid)
log.Infof("[Chaos]: Stopping proxy server")
if err := common.RunBashCommand(stopProxyServerCommand, "failed to stop proxy server", source); err != nil {
return err
}
log.Info("[Info]: Proxy stopped successfully")
return nil
}
// addIPRuleSet adds the ip rule set to iptables in target container
// it is using nsenter command to enter into network namespace of target container
// and execute the iptables related command inside it.
func addIPRuleSet(experimentDetails *experimentTypes.ExperimentDetails, pid int) error {
// it adds the proxy port REDIRECT iprule in the beginning of the PREROUTING table
// so that it always matches all the incoming packets for the matching target port filters and
// if matches then it redirect the request to the proxy port
addIPRuleSetCommand := fmt.Sprintf("(sudo nsenter -t %d -n iptables -t nat -I PREROUTING -i %v -p tcp --dport %d -j REDIRECT --to-port %d)", pid, experimentDetails.NetworkInterface, experimentDetails.TargetServicePort, experimentDetails.ProxyPort)
log.Infof("[Chaos]: Adding IPtables ruleset")
if err := common.RunBashCommand(addIPRuleSetCommand, "failed to add ip rules", experimentDetails.ChaosPodName); err != nil {
return err
}
log.Info("[Info]: IP rule set added successfully")
return nil
}
const NoIPRulesetToRemove = "No chain/target/match by that name"
// removeIPRuleSet removes the ip rule set from iptables in target container
// it is using nsenter command to enter into network namespace of target container
// and execute the iptables related command inside it.
func removeIPRuleSet(experimentDetails *experimentTypes.ExperimentDetails, pid int) error {
removeIPRuleSetCommand := fmt.Sprintf("sudo nsenter -t %d -n iptables -t nat -D PREROUTING -i %v -p tcp --dport %d -j REDIRECT --to-port %d", pid, experimentDetails.NetworkInterface, experimentDetails.TargetServicePort, experimentDetails.ProxyPort)
log.Infof("[Chaos]: Removing IPtables ruleset")
if err := common.RunBashCommand(removeIPRuleSetCommand, "failed to remove ip rules", experimentDetails.ChaosPodName); err != nil {
return err
}
log.Info("[Info]: IP rule set removed successfully")
return nil
}
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", ""))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
experimentDetails.NetworkInterface = types.Getenv("NETWORK_INTERFACE", "")
experimentDetails.TargetServicePort, _ = strconv.Atoi(types.Getenv("TARGET_SERVICE_PORT", ""))
experimentDetails.ProxyPort, _ = strconv.Atoi(types.Getenv("PROXY_PORT", ""))
experimentDetails.Toxicity, _ = strconv.Atoi(types.Getenv("TOXICITY", "100"))
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targets []targetDetails, resultName, chaosNS string, experimentDetails *experimentTypes.ExperimentDetails) {
<-abort
log.Info("[Abort]: Killing process started because of terminated signal received")
log.Info("[Abort]: Chaos Revert Started")
retry := 3
for retry > 0 {
for _, t := range targets {
if err = revertChaos(experimentDetails, t); err != nil {
if strings.Contains(err.Error(), NoIPRulesetToRemove) && strings.Contains(err.Error(), NoProxyToKill) {
continue
}
log.Errorf("unable to revert for %v pod, err :%v", t.Name, err)
continue
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult for %v pod, err :%v", t.Name, err)
}
}
retry--
time.Sleep(1 * time.Second)
}
log.Info("Chaos Revert Completed")
os.Exit(1)
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
ContainerId string
Pid int
Source string
}

View File

@ -0,0 +1,37 @@
package header
import (
"context"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
// PodHttpModifyHeaderChaos contains the steps to prepare and inject http modify header chaos
func PodHttpModifyHeaderChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHTTPModifyHeaderFault")
defer span.End()
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
"Listen Port": experimentsDetails.ProxyPort,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Toxicity": experimentsDetails.Toxicity,
"Headers": experimentsDetails.HeadersMap,
"Header Mode": experimentsDetails.HeaderMode,
})
stream := "downstream"
if experimentsDetails.HeaderMode == "request" {
stream = "upstream"
}
args := "-t header --" + stream + " -a headers='" + (experimentsDetails.HeadersMap) + "' -a mode=" + experimentsDetails.HeaderMode
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -0,0 +1,266 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error {
var err error
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
//set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, args, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, args, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode inject the http chaos in all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, args string, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodHTTPFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform http chaos
for _, pod := range targetPodList.Items {
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode inject the http chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, args string, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodHTTPFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID, args string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateHTTPChaosHelperPod")
defer span.End()
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
Volumes: []apiv1.Volume{
{
Name: "cri-socket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
},
Args: []string{
"-c",
"./helpers -name http-chaos",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(ctx, experimentsDetails, targets, args),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
MountPath: experimentsDetails.SocketPath,
},
},
SecurityContext: &apiv1.SecurityContext{
Privileged: &privilegedEnable,
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"NET_ADMIN",
"SYS_ADMIN",
},
},
},
},
},
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets, args string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
SetEnv("CHAOS_UID", string(experimentsDetails.ChaosUID)).
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("TOXIC_COMMAND", args).
SetEnv("NETWORK_INTERFACE", experimentsDetails.NetworkInterface).
SetEnv("TARGET_SERVICE_PORT", strconv.Itoa(experimentsDetails.TargetServicePort)).
SetEnv("PROXY_PORT", strconv.Itoa(experimentsDetails.ProxyPort)).
SetEnv("TOXICITY", strconv.Itoa(experimentsDetails.Toxicity)).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
}
// SetChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -0,0 +1,33 @@
package latency
import (
"context"
"strconv"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
// PodHttpLatencyChaos contains the steps to prepare and inject http latency chaos
func PodHttpLatencyChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHttpLatencyFault")
defer span.End()
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
"Listen Port": experimentsDetails.ProxyPort,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Toxicity": experimentsDetails.Toxicity,
"Latency": experimentsDetails.Latency,
})
args := "-t latency -a latency=" + strconv.Itoa(experimentsDetails.Latency)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -0,0 +1,50 @@
package modifybody
import (
"context"
"fmt"
"math"
"strings"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
// PodHttpModifyBodyChaos contains the steps to prepare and inject http modify body chaos
func PodHttpModifyBodyChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHTTPModifyBodyFault")
defer span.End()
// responseBodyMaxLength defines the max length of response body string to be printed. It is taken as
// the min of length of body and 120 characters to avoid printing large response body.
responseBodyMaxLength := int(math.Min(float64(len(experimentsDetails.ResponseBody)), 120))
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
"Listen Port": experimentsDetails.ProxyPort,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Toxicity": experimentsDetails.Toxicity,
"ResponseBody": experimentsDetails.ResponseBody[0:responseBodyMaxLength],
"Content Type": experimentsDetails.ContentType,
"Content Encoding": experimentsDetails.ContentEncoding,
})
args := fmt.Sprintf(
`-t modify_body -a body="%v" -a content_type=%v -a content_encoding=%v`,
EscapeQuotes(experimentsDetails.ResponseBody), experimentsDetails.ContentType, experimentsDetails.ContentEncoding)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// EscapeQuotes escapes the quotes in the given string
func EscapeQuotes(input string) string {
output := strings.ReplaceAll(input, `\`, `\\`)
output = strings.ReplaceAll(output, `"`, `\"`)
return output
}

View File

@ -0,0 +1,33 @@
package reset
import (
"context"
"strconv"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
// PodHttpResetPeerChaos contains the steps to prepare and inject http reset peer chaos
func PodHttpResetPeerChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHTTPResetPeerFault")
defer span.End()
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
"Listen Port": experimentsDetails.ProxyPort,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Toxicity": experimentsDetails.Toxicity,
"Reset Timeout": experimentsDetails.ResetTimeout,
})
args := "-t reset_peer -a timeout=" + strconv.Itoa(experimentsDetails.ResetTimeout)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -0,0 +1,118 @@
package statuscode
import (
"context"
"fmt"
"math"
"math/rand"
"strconv"
"strings"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"go.opentelemetry.io/otel"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
body "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib/modify-body"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
)
var acceptedStatusCodes = []string{
"200", "201", "202", "204",
"300", "301", "302", "304", "307",
"400", "401", "403", "404",
"500", "501", "502", "503", "504",
}
// PodHttpStatusCodeChaos contains the steps to prepare and inject http status code chaos
func PodHttpStatusCodeChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHttpStatusCodeFault")
defer span.End()
// responseBodyMaxLength defines the max length of response body string to be printed. It is taken as
// the min of length of body and 120 characters to avoid printing large response body.
responseBodyMaxLength := int(math.Min(float64(len(experimentsDetails.ResponseBody)), 120))
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
"Listen Port": experimentsDetails.ProxyPort,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Toxicity": experimentsDetails.Toxicity,
"StatusCode": experimentsDetails.StatusCode,
"ModifyResponseBody": experimentsDetails.ModifyResponseBody,
"ResponseBody": experimentsDetails.ResponseBody[0:responseBodyMaxLength],
"Content Type": experimentsDetails.ContentType,
"Content Encoding": experimentsDetails.ContentEncoding,
})
args := fmt.Sprintf(
`-t status_code -a status_code=%s -a modify_response_body=%d -a response_body="%v" -a content_type=%s -a content_encoding=%s`,
experimentsDetails.StatusCode, stringBoolToInt(experimentsDetails.ModifyResponseBody), body.EscapeQuotes(experimentsDetails.ResponseBody),
experimentsDetails.ContentType, experimentsDetails.ContentEncoding)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// GetStatusCode performs two functions:
// 1. It checks if the status code is provided or not. If it's not then it selects a random status code from supported list
// 2. It checks if the provided status code is valid or not.
func GetStatusCode(statusCode string) (string, error) {
if statusCode == "" {
log.Info("[Info]: No status code provided. Selecting a status code randomly from supported status codes")
return acceptedStatusCodes[rand.Intn(len(acceptedStatusCodes))], nil
}
statusCodeList := strings.Split(statusCode, ",")
rand.Seed(time.Now().Unix())
if len(statusCodeList) == 1 {
if checkStatusCode(statusCodeList[0], acceptedStatusCodes) {
return statusCodeList[0], nil
}
} else {
acceptedCodes := getAcceptedCodesInList(statusCodeList, acceptedStatusCodes)
if len(acceptedCodes) == 0 {
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("invalid status code: %s", statusCode)}
}
return acceptedCodes[rand.Intn(len(acceptedCodes))], nil
}
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("status code '%s' is not supported. Supported status codes are: %v", statusCode, acceptedStatusCodes)}
}
// getAcceptedCodesInList returns the list of accepted status codes from a list of status codes
func getAcceptedCodesInList(statusCodeList []string, acceptedStatusCodes []string) []string {
var acceptedCodes []string
for _, statusCode := range statusCodeList {
if checkStatusCode(statusCode, acceptedStatusCodes) {
acceptedCodes = append(acceptedCodes, statusCode)
}
}
return acceptedCodes
}
// checkStatusCode checks if the provided status code is present in acceptedStatusCode list
func checkStatusCode(statusCode string, acceptedStatusCodes []string) bool {
for _, code := range acceptedStatusCodes {
if code == statusCode {
return true
}
}
return false
}
// stringBoolToInt will convert boolean string to int
func stringBoolToInt(b string) int {
parsedBool, err := strconv.ParseBool(b)
if err != nil {
return 0
}
if parsedBool {
return 1
}
return 0
}

View File

@ -0,0 +1,165 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/load/k6-loadgen/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
corev1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectK6LoadGenFault")
defer span.End()
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
runID := stringutils.GetRunID()
// creating the helper pod to perform k6-loadgen chaos
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// PrepareChaos contains the preparation steps before chaos injection
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareK6LoadGenFault")
defer span.End()
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Starting the k6-loadgen experiment
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not execute chaos")
}
// Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateK6LoadGenFaultHelperPod")
defer span.End()
const volumeName = "script-volume"
const mountPath = "/mnt"
var envs []corev1.EnvVar
args := []string{
mountPath + "/" + experimentsDetails.ScriptSecretKey,
"-q",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--tag",
"trace_id=" + span.SpanContext().TraceID().String(),
}
if otelExporterEndpoint := os.Getenv(telemetry.OTELExporterOTLPEndpoint); otelExporterEndpoint != "" {
envs = []corev1.EnvVar{
{
Name: "K6_OTEL_METRIC_PREFIX",
Value: experimentsDetails.OTELMetricPrefix,
},
{
Name: "K6_OTEL_GRPC_EXPORTER_INSECURE",
Value: "true",
},
{
Name: "K6_OTEL_GRPC_EXPORTER_ENDPOINT",
Value: otelExporterEndpoint,
},
}
args = append(args, "--out", "experimental-opentelemetry")
}
helperPod := &corev1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: corev1.PodSpec{
RestartPolicy: corev1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
Containers: []corev1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: corev1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"k6",
"run",
},
Args: args,
Env: envs,
Resources: chaosDetails.Resources,
VolumeMounts: []corev1.VolumeMount{
{
Name: volumeName,
MountPath: mountPath,
},
},
},
},
Volumes: []corev1.Volume{
{
Name: volumeName,
VolumeSource: corev1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
SecretName: experimentsDetails.ScriptSecretName,
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}

View File

@ -1,80 +1,104 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
"time"
pod_delete "github.com/litmuschaos/litmus-go/chaoslib/litmus/pod-delete/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/workloads"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-delete/types"
"github.com/litmuschaos/litmus-go/pkg/kafka"
kafkaTypes "github.com/litmuschaos/litmus-go/pkg/kafka/types"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kafka/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var err error
//PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(kafkaDetails *kafkaTypes.ExperimentDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Getting the iteration count for the pod deletion
pod_delete.GetIterations(experimentsDetails)
// PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareKafkaPodDeleteFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
if experimentsDetails.ChaoslibDetail.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.ChaoslibDetail.RampTime)
common.WaitForDuration(experimentsDetails.ChaoslibDetail.RampTime)
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(kafkaDetails, experimentsDetails, clients, chaosDetails, eventsDetails); err != nil {
return err
switch strings.ToLower(experimentsDetails.ChaoslibDetail.Sequence) {
case "serial":
if err := injectChaosInSerialMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
} else {
if err = InjectChaosInParallelMode(kafkaDetails, experimentsDetails, clients, chaosDetails, eventsDetails); err != nil {
return err
case "parallel":
if err := injectChaosInParallelMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.ChaoslibDetail.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
if experimentsDetails.ChaoslibDetail.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.ChaoslibDetail.RampTime)
common.WaitForDuration(experimentsDetails.ChaoslibDetail.RampTime)
}
return nil
}
// InjectChaosInSerialMode delete the kafka broker pods in serial mode(one by one)
func InjectChaosInSerialMode(kafkaDetails *kafkaTypes.ExperimentDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails) error {
// injectChaosInSerialMode delete the kafka broker pods in serial mode(one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectKafkaPodDeleteFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
GracePeriod := int64(0)
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now().Unix()
for count := 0; count < experimentsDetails.Iterations; count++ {
//When broker is not defined
if kafkaDetails.TargetPod == "" {
err = kafka.LaunchStreamDeriveLeader(kafkaDetails, clients)
if err != nil {
return errors.Errorf("fail to derive the leader, err: %v", err)
}
}
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaoslibDetail.ChaosDuration {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if experimentsDetails.KafkaBroker == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or KAFKA_BROKER"}
}
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.ChaoslibDetail.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.KafkaBroker, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
if experimentsDetails.EngineName != "" {
// deriving the parent name of the target resources
for _, pod := range targetPodList.Items {
kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return err
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
}
if experimentsDetails.ChaoslibDetail.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
@ -86,72 +110,90 @@ func InjectChaosInSerialMode(kafkaDetails *kafkaTypes.ExperimentDetails, experim
log.InfoWithValues("[Info]: Killing the following pods", logrus.Fields{
"PodName": pod.Name})
if experimentsDetails.Force == true {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(pod.Name, &v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
if experimentsDetails.ChaoslibDetail.Force {
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(pod.Name, &v1.DeleteOptions{})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
}
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
}
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaosInterval != 0 {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
switch chaosDetails.Randomness {
case true:
if err := common.RandomInterval(experimentsDetails.ChaoslibDetail.ChaosInterval); err != nil {
return stacktrace.Propagate(err, "could not get random chaos interval")
}
default:
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaoslibDetail.ChaosInterval != "" {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaoslibDetail.ChaosInterval)
waitTime, _ := strconv.Atoi(experimentsDetails.ChaoslibDetail.ChaosInterval)
common.WaitForDuration(waitTime)
}
}
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return err
}
//ChaosCurrentTimeStamp contains the current timestamp
ChaosCurrentTimeStamp := time.Now().Unix()
//ChaosDiffTimeStamp contains the difference of current timestamp and start timestamp
//It will helpful to track the total chaos duration
chaosDiffTimeStamp := ChaosCurrentTimeStamp - ChaosStartTimeStamp
if int(chaosDiffTimeStamp) >= experimentsDetails.ChaosDuration {
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
break
for _, parent := range chaosDetails.ParentsResources {
target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.ChaoslibDetail.Timeout, experimentsDetails.ChaoslibDetail.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
log.Infof("[Completion]: %v chaos is done", experimentsDetails.ExperimentName)
return nil
}
// InjectChaosInParallelMode delete the kafka broker pods in parallel mode (all at once)
func InjectChaosInParallelMode(kafkaDetails *kafkaTypes.ExperimentDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails) error {
// injectChaosInParallelMode delete the kafka broker pods in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectKafkaPodDeleteFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
GracePeriod := int64(0)
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now().Unix()
for count := 0; count < experimentsDetails.Iterations; count++ {
//When broker is not defined
if kafkaDetails.TargetPod == "" {
err = kafka.LaunchStreamDeriveLeader(kafkaDetails, clients)
if err != nil {
return errors.Errorf("fail to derive the leader, err: %v", err)
}
}
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaoslibDetail.ChaosDuration {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if experimentsDetails.KafkaBroker == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or KAFKA_BROKER"}
}
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.ChaoslibDetail.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.KafkaBroker, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
if experimentsDetails.EngineName != "" {
// deriving the parent name of the target resources
for _, pod := range targetPodList.Items {
kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return stacktrace.Propagate(err, "could not get pod owner name and kind")
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
}
if experimentsDetails.ChaoslibDetail.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
@ -163,39 +205,44 @@ func InjectChaosInParallelMode(kafkaDetails *kafkaTypes.ExperimentDetails, exper
log.InfoWithValues("[Info]: Killing the following pods", logrus.Fields{
"PodName": pod.Name})
if experimentsDetails.Force == true {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(pod.Name, &v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
if experimentsDetails.ChaoslibDetail.Force {
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(pod.Name, &v1.DeleteOptions{})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
}
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
}
}
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaosInterval != 0 {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
switch chaosDetails.Randomness {
case true:
if err := common.RandomInterval(experimentsDetails.ChaoslibDetail.ChaosInterval); err != nil {
return stacktrace.Propagate(err, "could not get random chaos interval")
}
default:
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaoslibDetail.ChaosInterval != "" {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaoslibDetail.ChaosInterval)
waitTime, _ := strconv.Atoi(experimentsDetails.ChaoslibDetail.ChaosInterval)
common.WaitForDuration(waitTime)
}
}
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return err
for _, parent := range chaosDetails.ParentsResources {
target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.ChaoslibDetail.Timeout, experimentsDetails.ChaoslibDetail.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
}
//ChaosCurrentTimeStamp contains the current timestamp
ChaosCurrentTimeStamp := time.Now().Unix()
//ChaosDiffTimeStamp contains the difference of current timestamp and start timestamp
//It will helpful to track the total chaos duration
chaosDiffTimeStamp := ChaosCurrentTimeStamp - ChaosStartTimeStamp
if int(chaosDiffTimeStamp) >= experimentsDetails.ChaosDuration {
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
break
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
log.Infof("[Completion]: %v chaos is done", experimentsDetails.ExperimentName)

View File

@ -1,30 +1,39 @@
package lib
import (
"context"
"fmt"
"strconv"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/kubelet-service-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareKubeletKill contains prepration steps before chaos injection
func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareKubeletKill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareKubeletServiceKillFault")
defer span.End()
var err error
if experimentsDetails.TargetNode == "" {
//Select node for kubelet-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, clients)
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node name")
}
}
@ -32,7 +41,7 @@ func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, c
"NodeName": experimentsDetails.TargetNode,
})
experimentsDetails.RunID = common.GetRunID()
experimentsDetails.RunID = stringutils.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -47,64 +56,34 @@ func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, c
}
if experimentsDetails.EngineName != "" {
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// Get Resource Requirements
experimentsDetails.Resources, err = common.GetChaosPodResourceRequirements(experimentsDetails.ChaosPodName, experimentsDetails.ExperimentName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("Unable to get resource requirements, err: %v", err)
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
// Creating the helper pod to perform node memory hog
err = CreateHelperPod(experimentsDetails, clients, experimentsDetails.TargetNode)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
if err = createHelperPod(ctx, experimentsDetails, clients, chaosDetails, experimentsDetails.TargetNode); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
if err := common.CheckHelperStatusAndRunProbes(ctx, appLabel, experimentsDetails.TargetNode, chaosDetails, clients, resultDetails, eventsDetails); err != nil {
return err
}
// Checking for the node to be in not-ready state
log.Info("[Status]: Check for the node to be in NotReady state")
err = status.CheckNodeNotReadyState(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("application node is not in NotReady state, err: %v", err)
if err = status.CheckNodeNotReadyState(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
if deleteErr := common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients); deleteErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[err: %v, delete error: %v]", err, deleteErr)}
}
return stacktrace.Propagate(err, "could not check for NOT READY state")
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", experimentsDetails.ChaosDuration+30)
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+30, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed, err: %v", err)
}
// Checking the status of target nodes
log.Info("[Status]: Getting the status of target nodes")
err = status.CheckNodeStatus(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
log.Warnf("Target nodes are not in the ready state, you may need to manually recover the node, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
if err := common.WaitForCompletionAndDeleteHelperPods(appLabel, chaosDetails, clients, false); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
@ -112,28 +91,30 @@ func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, c
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appNodeName string) error {
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateKubeletServiceKillFaultHelperPod")
defer span.End()
privileged := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName,
"name": experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
},
Annotations: experimentsDetails.Annotations,
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNodeName,
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
Volumes: []apiv1.Volume{
{
Name: "bus",
@ -164,7 +145,7 @@ func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"-c",
"sleep 10 && systemctl stop kubelet && sleep " + strconv.Itoa(experimentsDetails.ChaosDuration) + " && systemctl start kubelet",
},
Resources: experimentsDetails.Resources,
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "bus",
@ -181,9 +162,35 @@ func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
TTY: true,
},
},
Tolerations: []apiv1.Toleration{
{
Key: "node.kubernetes.io/not-ready",
Operator: apiv1.TolerationOperator("Exists"),
Effect: apiv1.TaintEffect("NoExecute"),
TolerationSeconds: ptrint64(int64(experimentsDetails.ChaosDuration) + 60),
},
{
Key: "node.kubernetes.io/unreachable",
Operator: apiv1.TolerationOperator("Exists"),
Effect: apiv1.TaintEffect("NoExecute"),
TolerationSeconds: ptrint64(int64(experimentsDetails.ChaosDuration) + 60),
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
func ptrint64(p int64) *int64 {
return &p
}

View File

@ -1,7 +1,7 @@
package main
package helper
import (
"encoding/json"
"context"
"fmt"
"os"
"os/exec"
@ -11,411 +11,413 @@ import (
"syscall"
"time"
"github.com/litmuschaos/litmus-go/chaoslib/litmus/network_latency/tc"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentEnv "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/environment"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/pkg/errors"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
clientTypes "k8s.io/apimachinery/pkg/types"
)
var err error
const (
qdiscNotFound = "Cannot delete qdisc with handle of zero"
qdiscNoFileFound = "RTNETLINK answers: No such file or directory"
)
func main() {
var (
err error
inject, abort chan os.Signal
sPorts, dPorts, whitelistDPorts, whitelistSPorts []string
)
// Helper injects the network chaos
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodNetworkFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
clients := clients.ClientSets{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
resultDetails := types.ResultDetails{}
//Getting kubeConfig and Generate ClientSets
if err := clients.GenerateClientSetFromKubeConfig(); err != nil {
log.Fatalf("Unable to Get the kubeconfig, err: %v", err)
}
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Fetching all the ENV passed for the helper pod
log.Info("[PreReq]: Getting the ENV variables")
GetENV(&experimentsDetails)
getENV(&experimentsDetails)
// Intialise the chaos attributes
experimentEnv.InitialiseChaosVariables(&chaosDetails, &experimentsDetails)
// Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters
// Initialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
err := PreparePodNetworkChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails)
err := preparePodNetworkChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails)
if err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
//PreparePodNetworkChaos contains the prepration steps before chaos injection
func PreparePodNetworkChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
containerID, err := GetContainerID(experimentsDetails, clients)
if err != nil {
return err
}
// extract out the pid of the target container
targetPID, err := GetPID(experimentsDetails, containerID)
if err != nil {
return err
// preparePodNetworkChaos contains the prepration steps before chaos injection
func preparePodNetworkChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
targetEnv := os.Getenv("TARGETS")
if targetEnv == "" {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: "no target found, provide atleast one target"}
}
var targets []targetDetails
for _, t := range strings.Split(targetEnv, ";") {
target := strings.Split(t, ":")
if len(target) != 4 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: fmt.Sprintf("unsupported target format: '%v'", targets)}
}
td := targetDetails{
Name: target[0],
Namespace: target[1],
TargetContainer: target[2],
DestinationIps: getDestIps(target[3]),
Source: chaosDetails.ChaosPodName,
}
td.ContainerId, err = common.GetRuntimeBasedContainerID(experimentsDetails.ContainerRuntime, experimentsDetails.SocketPath, td.Name, td.Namespace, td.TargetContainer, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the network ns path of the pod sandbox or pause container
td.NetworkNsPath, err = common.GetNetworkNsPath(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container network ns path")
}
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos
go abortWatcher(targets, experimentsDetails.NetworkInterface, resultDetails.Name, chaosDetails.ChaosNamespace)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
for index, t := range targets {
// injecting network chaos inside target container
if err = injectChaos(experimentsDetails.NetworkInterface, t); err != nil {
if revertErr := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, index-1); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not inject chaos")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, index); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
msg := "Injected " + experimentsDetails.ExperimentName + " chaos on application pods"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
// injecting network chaos inside target container
if err = InjectChaos(experimentsDetails, targetPID); err != nil {
return err
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
common.WaitForDuration(experimentsDetails.ChaosDuration)
loop:
for {
endTime = time.After(timeDelay)
select {
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
// updating the chaosresult after stopped
failStep := "Network Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
log.Info("[Chaos]: Duration is over, reverting chaos")
// generating summary event in chaosengine
msg := experimentsDetails.ExperimentName + " experiment has been aborted"
types.SetEngineEventAttributes(eventsDetails, types.Summary, msg, "Warning", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
// generating summary event in chaosresult
types.SetResultEventAttributes(eventsDetails, types.StoppedVerdict, msg, "Warning", resultDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosResult")
if err = tc.Killnetem(targetPID); err != nil {
log.Errorf("unable to kill netem process, err :%v", err)
}
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
log.Info("[Chaos]: Stopping the experiment")
// cleaning the netem process after chaos injection
if err = tc.Killnetem(targetPID); err != nil {
return err
if err := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
return nil
}
//GetContainerID extract out the container id of the target container
func GetContainerID(experimentDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (string, error) {
var containerID string
switch experimentDetails.ContainerRuntime {
case "docker":
host := "unix://" + experimentDetails.SocketPath
// deriving the container id of the pause container
cmd := "docker --host " + host + " ps | grep k8s_POD_" + experimentDetails.TargetPods + "_" + experimentDetails.AppNS + " | awk '{print $1}'"
out, err := exec.Command("/bin/sh", "-c", cmd).CombinedOutput()
if err != nil {
log.Error(fmt.Sprintf("[docker]: Failed to run docker ps command: %s", string(out)))
return "", err
func revertChaosForAllTargets(targets []targetDetails, networkInterface string, resultDetails *types.ResultDetails, chaosNs string, index int) error {
var errList []string
for i := 0; i <= index; i++ {
killed, err := killnetem(targets[i], networkInterface)
if !killed && err != nil {
errList = append(errList, err.Error())
continue
}
containerID = strings.TrimSpace(string(out))
case "containerd", "crio":
pod, err := clients.KubeClient.CoreV1().Pods(experimentDetails.AppNS).Get(experimentDetails.TargetPods, v1.GetOptions{})
if err != nil {
return "", err
}
// filtering out the container id from the details of containers inside containerStatuses of the given pod
// container id is present in the form of <runtime>://<container-id>
for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentDetails.TargetContainer {
containerID = strings.Split(container.ContainerID, "//")[1]
break
if killed && err == nil {
if err = result.AnnotateChaosResult(resultDetails.Name, chaosNs, "reverted", "pod", targets[i].Name); err != nil {
errList = append(errList, err.Error())
}
}
default:
return "", errors.Errorf("%v container runtime not suported", experimentDetails.ContainerRuntime)
}
log.Infof("containerid: %v", containerID)
return containerID, nil
}
//GetPID extract out the PID of the target container
func GetPID(experimentDetails *experimentTypes.ExperimentDetails, containerID string) (int, error) {
var PID int
switch experimentDetails.ContainerRuntime {
case "docker":
host := "unix://" + experimentDetails.SocketPath
// deriving pid from the inspect out of target container
out, err := exec.Command("docker", "--host", host, "inspect", containerID).CombinedOutput()
if err != nil {
log.Error(fmt.Sprintf("[docker]: Failed to run docker inspect: %s", string(out)))
return 0, err
}
// parsing data from the json output of inspect command
PID, err = parsePIDFromJSON(out, experimentDetails.ContainerRuntime)
if err != nil {
log.Error(fmt.Sprintf("[docker]: Failed to parse json from docker inspect output: %s", string(out)))
return 0, err
}
case "containerd", "crio":
// deriving pid from the inspect out of target container
endpoint := "unix://" + experimentDetails.SocketPath
out, err := exec.Command("crictl", "-i", endpoint, "-r", endpoint, "inspect", containerID).CombinedOutput()
if err != nil {
log.Error(fmt.Sprintf("[cri]: Failed to run crictl: %s", string(out)))
return 0, err
}
// parsing data from the json output of inspect command
PID, err = parsePIDFromJSON(out, experimentDetails.ContainerRuntime)
if err != nil {
log.Errorf(fmt.Sprintf("[cri]: Failed to parse json from crictl output: %s", string(out)))
return 0, err
}
default:
return 0, errors.Errorf("%v container runtime not suported", experimentDetails.ContainerRuntime)
}
log.Info(fmt.Sprintf("[cri]: Container ID=%s has process PID=%d", containerID, PID))
return PID, nil
}
// CrictlInspectResponse JSON representation of crictl inspect command output
// in crio, pid is present inside pid attribute of inspect output
// in containerd, pid is present inside `info.pid` of inspect output
type CrictlInspectResponse struct {
Info InfoDetails `json:"info"`
}
// InfoDetails JSON representation of crictl inspect command output
type InfoDetails struct {
RuntimeSpec RuntimeDetails `json:"runtimeSpec"`
PID int `json:"pid"`
}
// RuntimeDetails contains runtime details
type RuntimeDetails struct {
Linux LinuxAttributes `json:"linux"`
}
// LinuxAttributes contains all the linux attributes
type LinuxAttributes struct {
Namespaces []Namespace `json:"namespaces"`
}
// Namespace contains linux namespace details
type Namespace struct {
Type string `json:"type"`
Path string `json:"path"`
}
// DockerInspectResponse JSON representation of docker inspect command output
type DockerInspectResponse struct {
State StateDetails `json:"state"`
}
// StateDetails JSON representation of docker inspect command output
type StateDetails struct {
PID int `json:"pid"`
}
//parsePIDFromJSON extract the pid from the json output
func parsePIDFromJSON(j []byte, runtime string) (int, error) {
var pid int
// namespaces are present inside `info.runtimeSpec.linux.namespaces` of inspect output
// linux namespace of type network contains pid, in the form of `/proc/<pid>/ns/net`
switch runtime {
case "docker":
// in docker, pid is present inside state.pid attribute of inspect output
var resp []DockerInspectResponse
if err := json.Unmarshal(j, &resp); err != nil {
return 0, err
}
pid = resp[0].State.PID
case "containerd":
var resp CrictlInspectResponse
if err := json.Unmarshal(j, &resp); err != nil {
return 0, err
}
for _, namespace := range resp.Info.RuntimeSpec.Linux.Namespaces {
if namespace.Type == "network" {
value := strings.Split(namespace.Path, "/")[2]
pid, _ = strconv.Atoi(value)
}
}
case "crio":
var info InfoDetails
if err := json.Unmarshal(j, &info); err != nil {
return 0, err
}
pid = info.PID
if pid == 0 {
var resp CrictlInspectResponse
if err := json.Unmarshal(j, &resp); err != nil {
return 0, err
}
pid = resp.Info.PID
}
default:
return 0, errors.Errorf("[cri]: No supported container runtime, runtime: %v", runtime)
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
if pid == 0 {
return 0, errors.Errorf("[cri]: No running target container found, pid: %d", pid)
}
return pid, nil
return nil
}
// InjectChaos inject the network chaos in target container
// injectChaos inject the network chaos in target container
// it is using nsenter command to enter into network namespace of target container
// and execute the netem command inside it.
func InjectChaos(experimentDetails *experimentTypes.ExperimentDetails, pid int) error {
func injectChaos(netInterface string, target targetDetails) error {
netemCommands := os.Getenv("NETEM_COMMAND")
destinationIPs := os.Getenv("DESTINATION_IPS")
if destinationIPs == "" {
tc := fmt.Sprintf("sudo nsenter -t %d -n tc qdisc replace dev %s root netem %v", pid, experimentDetails.NetworkInterface, netemCommands)
cmd := exec.Command("/bin/bash", "-c", tc)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
if len(target.DestinationIps) == 0 && len(sPorts) == 0 && len(dPorts) == 0 && len(whitelistDPorts) == 0 && len(whitelistSPorts) == 0 {
tc := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %s root %v", target.NetworkNsPath, netInterface, netemCommands)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create tc rules", target.Source); err != nil {
return err
}
} else {
ips := strings.Split(destinationIPs, ",")
var uniqueIps []string
// removing duplicates ips from the list, if any
for i := range ips {
isPresent := false
for j := range uniqueIps {
if ips[i] == uniqueIps[j] {
isPresent = true
}
}
if !isPresent {
uniqueIps = append(uniqueIps, ips[i])
}
}
// Create a priority-based queue
// This instantly creates classes 1:1, 1:2, 1:3
priority := fmt.Sprintf("sudo nsenter -t %v -n tc qdisc replace dev %v root handle 1: prio", pid, experimentDetails.NetworkInterface)
cmd := exec.Command("/bin/bash", "-c", priority)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
priority := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %v root handle 1: prio", target.NetworkNsPath, netInterface)
log.Info(priority)
if err := common.RunBashCommand(priority, "failed to create priority-based queue", target.Source); err != nil {
return err
}
// Add queueing discipline for 1:3 class.
// No traffic is going through 1:3 yet
traffic := fmt.Sprintf("sudo nsenter -t %v -n tc qdisc replace dev %v parent 1:3 netem %v", pid, experimentDetails.NetworkInterface, netemCommands)
cmd = exec.Command("/bin/bash", "-c", traffic)
out, err = cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
traffic := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %v parent 1:3 %v", target.NetworkNsPath, netInterface, netemCommands)
log.Info(traffic)
if err := common.RunBashCommand(traffic, "failed to create netem queueing discipline", target.Source); err != nil {
return err
}
for _, ip := range uniqueIps {
if len(whitelistDPorts) != 0 || len(whitelistSPorts) != 0 {
for _, port := range whitelistDPorts {
//redirect traffic to specific dport through band 2
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 2 u32 match ip dport %v 0xffff flowid 1:2", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create whitelist dport match filters", target.Source); err != nil {
return err
}
}
// redirect traffic to specific IP through band 3
// It allows ipv4 addresses only
if !strings.Contains(ip, ":") {
tc := fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dst %v flowid 1:3", pid, experimentDetails.NetworkInterface, ip)
cmd = exec.Command("/bin/bash", "-c", tc)
out, err = cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
for _, port := range whitelistSPorts {
//redirect traffic to specific sport through band 2
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 2 u32 match ip sport %v 0xffff flowid 1:2", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create whitelist sport match filters", target.Source); err != nil {
return err
}
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dst 0.0.0.0/0 flowid 1:3", target.NetworkNsPath, netInterface)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create rule for all ports match filters", target.Source); err != nil {
return err
}
} else {
for i := range target.DestinationIps {
var (
ip = target.DestinationIps[i]
ports []string
isIPV6 = strings.Contains(target.DestinationIps[i], ":")
)
// extracting the destination ports from the ips
// ip format is ip(|port1|port2....|portx)
if strings.Contains(target.DestinationIps[i], "|") {
ip = strings.Split(target.DestinationIps[i], "|")[0]
ports = strings.Split(target.DestinationIps[i], "|")[1:]
}
// redirect traffic to specific IP through band 3
filter := fmt.Sprintf("match ip dst %v", ip)
if isIPV6 {
filter = fmt.Sprintf("match ip6 dst %v", ip)
}
if len(ports) != 0 {
for _, port := range ports {
portFilter := fmt.Sprintf("%s match ip dport %v 0xffff", filter, port)
if isIPV6 {
portFilter = fmt.Sprintf("%s match ip6 dport %v 0xffff", filter, port)
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 %s flowid 1:3", target.NetworkNsPath, netInterface, portFilter)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ips match filters", target.Source); err != nil {
return err
}
}
continue
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 %s flowid 1:3", target.NetworkNsPath, netInterface, filter)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ips match filters", target.Source); err != nil {
return err
}
}
for _, port := range sPorts {
//redirect traffic to specific sport through band 3
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip sport %v 0xffff flowid 1:3", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create source ports match filters", target.Source); err != nil {
return err
}
}
for _, port := range dPorts {
//redirect traffic to specific dport through band 3
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dport %v 0xffff flowid 1:3", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ports match filters", target.Source); err != nil {
return err
}
}
}
}
log.Infof("chaos injected successfully on {pod: %v, container: %v}", target.Name, target.TargetContainer)
return nil
}
// Killnetem kill the netem process for all the target containers
func Killnetem(PID int) error {
tc := fmt.Sprintf("sudo nsenter -t %d -n tc qdisc delete dev eth0 root", PID)
// killnetem kill the netem process for all the target containers
func killnetem(target targetDetails, networkInterface string) (bool, error) {
tc := fmt.Sprintf("sudo nsenter --net=%s tc qdisc delete dev %s root", target.NetworkNsPath, networkInterface)
cmd := exec.Command("/bin/bash", "-c", tc)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
log.Info(cmd.String())
// ignoring err if qdisc process doesn't exist inside the target container
if strings.Contains(string(out), qdiscNotFound) || strings.Contains(string(out), qdiscNoFileFound) {
log.Warn("The network chaos process has already been removed")
return true, err
}
log.Error(err.Error())
return false, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: target.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", target.Name, target.Namespace, target.TargetContainer), Reason: fmt.Sprintf("failed to revert network faults: %s", string(out))}
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", target.Name, target.Namespace, target.TargetContainer)
return true, nil
}
type targetDetails struct {
Name string
Namespace string
ServiceMesh string
DestinationIps []string
TargetContainer string
ContainerId string
Source string
NetworkNsPath string
}
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", ""))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.NetworkInterface = types.Getenv("NETWORK_INTERFACE", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
experimentDetails.DestinationIPs = types.Getenv("DESTINATION_IPS", "")
experimentDetails.SourcePorts = types.Getenv("SOURCE_PORTS", "")
experimentDetails.DestinationPorts = types.Getenv("DESTINATION_PORTS", "")
if strings.TrimSpace(experimentDetails.DestinationPorts) != "" {
if strings.Contains(experimentDetails.DestinationPorts, "!") {
whitelistDPorts = strings.Split(strings.TrimPrefix(strings.TrimSpace(experimentDetails.DestinationPorts), "!"), ",")
} else {
dPorts = strings.Split(strings.TrimSpace(experimentDetails.DestinationPorts), ",")
}
}
if strings.TrimSpace(experimentDetails.SourcePorts) != "" {
if strings.Contains(experimentDetails.SourcePorts, "!") {
whitelistSPorts = strings.Split(strings.TrimPrefix(strings.TrimSpace(experimentDetails.SourcePorts), "!"), ",")
} else {
sPorts = strings.Split(strings.TrimSpace(experimentDetails.SourcePorts), ",")
}
}
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targets []targetDetails, networkInterface, resultName, chaosNS string) {
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
for _, t := range targets {
killed, err := killnetem(t, networkInterface)
if err != nil && !killed {
log.Errorf("unable to kill netem process, err :%v", err)
continue
}
if killed && err == nil {
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
}
}
retry--
time.Sleep(1 * time.Second)
}
log.Info("Chaos Revert Completed")
os.Exit(1)
}
func getDestIps(serviceMesh string) []string {
var (
destIps = os.Getenv("DESTINATION_IPS")
uniqueIps []string
)
if serviceMesh == "true" {
destIps = os.Getenv("DESTINATION_IPS_SERVICE_MESH")
}
return nil
}
//GetENV fetches all the env variables from the runner pod
func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = Getenv("EXPERIMENT_NAME", "")
experimentDetails.AppNS = Getenv("APP_NS", "")
experimentDetails.TargetContainer = Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = Getenv("APP_POD", "")
experimentDetails.AppLabel = Getenv("APP_LABEL", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosNamespace = Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = Getenv("CHAOS_ENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = Getenv("POD_NAME", "")
experimentDetails.ContainerRuntime = Getenv("CONTAINER_RUNTIME", "")
experimentDetails.NetworkInterface = Getenv("NETWORK_INTERFACE", "eth0")
experimentDetails.SocketPath = Getenv("SOCKET_PATH", "")
experimentDetails.DestinationIPs = Getenv("DESTINATION_IPS", "")
}
// Getenv fetch the env and set the default value, if any
func Getenv(key string, defaultValue string) string {
value := os.Getenv(key)
if value == "" {
value = defaultValue
if strings.TrimSpace(destIps) == "" {
return nil
}
return value
ips := strings.Split(strings.TrimSpace(destIps), ",")
// removing duplicates ips from the list, if any
for i := range ips {
if !common.Contains(ips[i], uniqueIps) {
uniqueIps = append(uniqueIps, ips[i])
}
}
return uniqueIps
}

View File

@ -1,24 +1,26 @@
package corruption
import (
"strconv"
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
var err error
// PodNetworkCorruptionChaos contains the steps to prepare and inject chaos
func PodNetworkCorruptionChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkCorruptionFault")
defer span.End()
//PodNetworkCorruptionChaos contains the steps to prepare and inject chaos
func PodNetworkCorruptionChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args := "corrupt " + strconv.Itoa(experimentsDetails.NetworkPacketCorruptionPercentage)
err = network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
if err != nil {
return err
args := "netem corrupt " + experimentsDetails.NetworkPacketCorruptionPercentage
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return nil
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,24 +1,26 @@
package duplication
import (
"strconv"
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
var err error
// PodNetworkDuplicationChaos contains the steps to prepare and inject chaos
func PodNetworkDuplicationChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkDuplicationFault")
defer span.End()
//PodNetworkDuplicationChaos contains the steps to prepare and inject chaos
func PodNetworkDuplicationChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args := "duplicate " + strconv.Itoa(experimentsDetails.NetworkPacketDuplicationPercentage)
err = network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
if err != nil {
return err
args := "netem duplicate " + experimentsDetails.NetworkPacketDuplicationPercentage
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return nil
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,24 +1,27 @@
package latency
import (
"context"
"fmt"
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
var err error
// PodNetworkLatencyChaos contains the steps to prepare and inject chaos
func PodNetworkLatencyChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkLatencyFault")
defer span.End()
//PodNetworkLatencyChaos contains the steps to prepare and inject chaos
func PodNetworkLatencyChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args := "delay " + strconv.Itoa(experimentsDetails.NetworkLatency) + "ms"
err = network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
if err != nil {
return err
args := "netem delay " + strconv.Itoa(experimentsDetails.NetworkLatency) + "ms " + strconv.Itoa(experimentsDetails.Jitter) + "ms"
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return nil
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,24 +1,26 @@
package loss
import (
"strconv"
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
var err error
// PodNetworkLossChaos contains the steps to prepare and inject chaos
func PodNetworkLossChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkLossFault")
defer span.End()
//PodNetworkLossChaos contains the steps to prepare and inject chaos
func PodNetworkLossChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args := "loss " + strconv.Itoa(experimentsDetails.NetworkPacketLossPercentage)
err = network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
if err != nil {
return err
args := "netem loss " + experimentsDetails.NetworkPacketLossPercentage
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return nil
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,39 +1,52 @@
package lib
import (
"context"
"fmt"
"net"
"os"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
k8serrors "k8s.io/apimachinery/pkg/api/errors"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var err error
var serviceMesh = []string{"istio", "envoy"}
var destIpsSvcMesh string
var destIps string
//PrepareAndInjectChaos contains the prepration & injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error {
// PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error {
var err error
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
//set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
logExperimentFields(experimentsDetails)
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -43,184 +56,143 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
// Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" {
err = GetServiceAccount(experimentsDetails, clients)
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return errors.Errorf("Unable to get the serviceAccountName, err: %v", err)
}
}
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = GetTargetContainer(experimentsDetails, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("Unable to get the target container name, err: %v", err)
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// Get Resource Requirements
experimentsDetails.Resources, err = common.GetChaosPodResourceRequirements(experimentsDetails.ChaosPodName, experimentsDetails.ExperimentName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("Unable to get resource requirements, err: %v", err)
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, args); err != nil {
return err
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
} else {
if err = InjectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, args); err != nil {
return err
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// InjectChaosInSerialMode inject the network chaos in all target application serially (one by one)
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string) error {
// injectChaosInSerialMode inject the network chaos in all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodNetworkFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
labelSuffix := common.GetRunID()
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
err = CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID, args, labelSuffix)
serviceMesh, err := setDestIps(pod, experimentsDetails, clients)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
return stacktrace.Propagate(err, "could not set destination ips")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-" + runID
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer, serviceMesh), pod.Spec.NodeName, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+60, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting the the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pods, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
}
// InjectChaosInParallelMode inject the network chaos in all target application in parallel mode (all at once)
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string) error {
// injectChaosInParallelMode inject the network chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodNetworkFaultInParallelMode")
defer span.End()
var err error
labelSuffix := common.GetRunID()
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
err = CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID, args, labelSuffix)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
targets, err := filterPodsForNodes(targetPodList, experimentsDetails, clients)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
return stacktrace.Propagate(err, "could not filter target pods")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+60, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
runID := stringutils.GetRunID()
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s:%s", k.Name, k.Namespace, k.TargetContainer, k.ServiceMesh))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pod")
err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pods, err: %v", err)
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
return nil
}
// GetServiceAccount find the serviceAccountName for the helper pod
func GetServiceAccount(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Get(experimentsDetails.ChaosPodName, v1.GetOptions{})
if err != nil {
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
experimentsDetails.ChaosServiceAccount = pod.Spec.ServiceAccountName
return nil
}
//GetTargetContainer will fetch the container name from application pod
//This container will be used as target container
func GetTargetContainer(experimentsDetails *experimentTypes.ExperimentDetails, appName string, clients clients.ClientSets) (string, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(appName, v1.GetOptions{})
if err != nil {
return "", err
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets string, nodeName, runID, args string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreatePodNetworkFaultHelperPod")
defer span.End()
return pod.Spec.Containers[0].Name, nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, podName, nodeName, runID, args, labelSuffix string) error {
privilegedEnable := true
var (
privilegedEnable = true
terminationGracePeriodSeconds = int64(experimentsDetails.TerminationGracePeriodSeconds)
helperName = fmt.Sprintf("%s-helper-%s", experimentsDetails.ExperimentName, stringutils.GetRunID())
)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper-" + labelSuffix,
"name": experimentsDetails.ExperimentName + "-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
},
Annotations: experimentsDetails.Annotations,
Name: helperName,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
HostPID: true,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
Tolerations: chaosDetails.Tolerations,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
Volumes: []apiv1.Volume{
{
Name: "cri-socket",
@ -231,24 +203,6 @@ func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
},
},
InitContainers: []apiv1.Container{
{
Name: "setup-" + experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
"-c",
"sudo chmod 777 " + experimentsDetails.SocketPath,
},
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
MountPath: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
@ -260,10 +214,10 @@ func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
Args: []string{
"-c",
"./helper/network-chaos",
"./helpers -name network-chaos",
},
Resources: experimentsDetails.Resources,
Env: GetPodEnv(experimentsDetails, podName, args),
Resources: chaosDetails.Resources,
Env: getPodEnv(ctx, experimentsDetails, targets, args),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
@ -284,88 +238,310 @@ func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
// mount the network ns path for crio runtime
// it is required to access the sandbox network ns
if strings.ToLower(experimentsDetails.ContainerRuntime) == "crio" {
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, apiv1.Volume{
Name: "netns-path",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/var/run/netns",
},
},
})
helperPod.Spec.Containers[0].VolumeMounts = append(helperPod.Spec.Containers[0].VolumeMounts, apiv1.VolumeMount{
Name: "netns-path",
MountPath: "/var/run/netns",
})
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// GetPodEnv derive all the env required for the helper pod
func GetPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName, args string) []apiv1.EnvVar {
// getPodEnv derive all the env required for the helper pod
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string, args string) []apiv1.EnvVar {
var envVar []apiv1.EnvVar
ENVList := map[string]string{
"APP_NS": experimentsDetails.AppNS,
"APP_POD": podName,
"APP_CONTAINER": experimentsDetails.TargetContainer,
"TOTAL_CHAOS_DURATION": strconv.Itoa(experimentsDetails.ChaosDuration),
"CHAOS_NAMESPACE": experimentsDetails.ChaosNamespace,
"CHAOS_ENGINE": experimentsDetails.EngineName,
"CHAOS_UID": string(experimentsDetails.ChaosUID),
"CONTAINER_RUNTIME": experimentsDetails.ContainerRuntime,
"NETEM_COMMAND": args,
"NETWORK_INTERFACE": experimentsDetails.NetworkInterface,
"EXPERIMENT_NAME": experimentsDetails.ExperimentName,
"SOCKET_PATH": experimentsDetails.SocketPath,
"DESTINATION_IPS": GetTargetIpsArgs(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts),
}
for key, value := range ENVList {
var perEnv apiv1.EnvVar
perEnv.Name = key
perEnv.Value = value
envVar = append(envVar, perEnv)
}
// Getting experiment pod name from downward API
experimentPodName := GetValueFromDownwardAPI("v1", "metadata.name")
var downwardEnv apiv1.EnvVar
downwardEnv.Name = "POD_NAME"
downwardEnv.ValueFrom = &experimentPodName
envVar = append(envVar, downwardEnv)
var envDetails common.ENVDetails
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
SetEnv("CHAOS_UID", string(experimentsDetails.ChaosUID)).
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("NETEM_COMMAND", args).
SetEnv("NETWORK_INTERFACE", experimentsDetails.NetworkInterface).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("DESTINATION_IPS", destIps).
SetEnv("DESTINATION_IPS_SERVICE_MESH", destIpsSvcMesh).
SetEnv("SOURCE_PORTS", experimentsDetails.SourcePorts).
SetEnv("DESTINATION_PORTS", experimentsDetails.DestinationPorts).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envVar
return envDetails.ENV
}
// GetValueFromDownwardAPI returns the value from downwardApi
func GetValueFromDownwardAPI(apiVersion string, fieldPath string) apiv1.EnvVarSource {
downwardENV := apiv1.EnvVarSource{
FieldRef: &apiv1.ObjectFieldSelector{
APIVersion: apiVersion,
FieldPath: fieldPath,
},
}
return downwardENV
type targetsDetails struct {
Target []target
}
// GetTargetIpsArgs return the comma separated target ips
// It fetch the ips from the target ips (if defined by users)
// it append the ips from the host, if target host is provided
func GetTargetIpsArgs(targetIPs, targetHosts string) string {
type target struct {
Namespace string
Name string
TargetContainer string
ServiceMesh string
}
ipsFromHost := GetIpsForTargetHosts(targetHosts)
// GetTargetIps return the comma separated target ips
// It fetches the ips from the target ips (if defined by users)
// it appends the ips from the host, if target host is provided
func GetTargetIps(targetIPs, targetHosts string, clients clients.ClientSets, serviceMesh bool) (string, error) {
ipsFromHost, err := getIpsForTargetHosts(targetHosts, clients, serviceMesh)
if err != nil {
return "", stacktrace.Propagate(err, "could not get ips from target hosts")
}
if targetIPs == "" {
targetIPs = ipsFromHost
} else if ipsFromHost != "" {
targetIPs = targetIPs + "," + ipsFromHost
}
return targetIPs
return targetIPs, nil
}
// GetIpsForTargetHosts resolves IP addresses for comma-separated list of target hosts and returns comma-separated ips
func GetIpsForTargetHosts(targetHosts string) string {
// it derives the pod ips from the kubernetes service
func getPodIPFromService(host string, clients clients.ClientSets) ([]string, error) {
var ips []string
svcFields := strings.Split(host, ".")
if len(svcFields) != 5 {
return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{host: %s}", host), Reason: "provide the valid FQDN for service in '<svc-name>.<namespace>.svc.cluster.local format"}
}
svcName, svcNs := svcFields[0], svcFields[1]
svc, err := clients.GetService(svcNs, svcName)
if err != nil {
if k8serrors.IsForbidden(err) {
log.Warnf("forbidden - failed to get %v service in %v namespace, err: %v", svcName, svcNs, err)
return ips, nil
}
return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{serviceName: %s, namespace: %s}", svcName, svcNs), Reason: err.Error()}
}
if svc.Spec.Selector == nil {
return nil, nil
}
var svcSelector string
for k, v := range svc.Spec.Selector {
if svcSelector == "" {
svcSelector += fmt.Sprintf("%s=%s", k, v)
continue
}
svcSelector += fmt.Sprintf(",%s=%s", k, v)
}
pods, err := clients.ListPods(svcNs, svcSelector)
if err != nil {
return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{svcName: %s,podLabel: %s, namespace: %s}", svcNs, svcSelector, svcNs), Reason: fmt.Sprintf("failed to derive pods from service: %s", err.Error())}
}
for _, p := range pods.Items {
if p.Status.PodIP == "" {
continue
}
ips = append(ips, p.Status.PodIP)
}
return ips, nil
}
// getIpsForTargetHosts resolves IP addresses for comma-separated list of target hosts and returns comma-separated ips
func getIpsForTargetHosts(targetHosts string, clients clients.ClientSets, serviceMesh bool) (string, error) {
if targetHosts == "" {
return ""
return "", nil
}
hosts := strings.Split(targetHosts, ",")
finalHosts := ""
var commaSeparatedIPs []string
for i := range hosts {
ips, err := net.LookupIP(hosts[i])
hosts[i] = strings.TrimSpace(hosts[i])
var (
hostName = hosts[i]
ports []string
)
if strings.Contains(hosts[i], "|") {
host := strings.Split(hosts[i], "|")
hostName = host[0]
ports = host[1:]
log.Infof("host and port: %v :%v", hostName, ports)
}
if strings.Contains(hostName, "svc.cluster.local") && serviceMesh {
ips, err := getPodIPFromService(hostName, clients)
if err != nil {
return "", stacktrace.Propagate(err, "could not get pod ips from service")
}
log.Infof("Host: {%v}, IP address: {%v}", hosts[i], ips)
if ports != nil {
for j := range ips {
commaSeparatedIPs = append(commaSeparatedIPs, ips[j]+"|"+strings.Join(ports, "|"))
}
} else {
commaSeparatedIPs = append(commaSeparatedIPs, ips...)
}
if finalHosts == "" {
finalHosts = hosts[i]
} else {
finalHosts = finalHosts + "," + hosts[i]
}
continue
}
ips, err := net.LookupIP(hostName)
if err != nil {
log.Infof("Unknown host")
log.Warnf("Unknown host: {%v}, it won't be included in the scope of chaos", hostName)
} else {
for j := range ips {
log.Infof("IP address: %v", ips[j])
log.Infof("Host: {%v}, IP address: {%v}", hostName, ips[j])
if ports != nil {
commaSeparatedIPs = append(commaSeparatedIPs, ips[j].String()+"|"+strings.Join(ports, "|"))
continue
}
commaSeparatedIPs = append(commaSeparatedIPs, ips[j].String())
}
if finalHosts == "" {
finalHosts = hosts[i]
} else {
finalHosts = finalHosts + "," + hosts[i]
}
}
}
return strings.Join(commaSeparatedIPs, ",")
if len(commaSeparatedIPs) == 0 {
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("hosts: %s", targetHosts), Reason: "provided hosts are invalid, unable to resolve"}
}
log.Infof("Injecting chaos on {%v} hosts", finalHosts)
return strings.Join(commaSeparatedIPs, ","), nil
}
// SetChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.NetworkPacketLossPercentage = common.ValidateRange(experimentsDetails.NetworkPacketLossPercentage)
experimentsDetails.NetworkPacketCorruptionPercentage = common.ValidateRange(experimentsDetails.NetworkPacketCorruptionPercentage)
experimentsDetails.NetworkPacketDuplicationPercentage = common.ValidateRange(experimentsDetails.NetworkPacketDuplicationPercentage)
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}
// It checks if pod contains service mesh sidecar
func isServiceMeshEnabledForPod(pod apiv1.Pod) bool {
for _, c := range pod.Spec.Containers {
if common.SubStringExistsInSlice(c.Name, serviceMesh) {
return true
}
}
return false
}
func setDestIps(pod apiv1.Pod, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (string, error) {
var err error
if isServiceMeshEnabledForPod(pod) {
if destIpsSvcMesh == "" {
destIpsSvcMesh, err = GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients, true)
if err != nil {
return "false", err
}
}
return "true", nil
}
if destIps == "" {
destIps, err = GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients, false)
if err != nil {
return "false", err
}
}
return "false", nil
}
func filterPodsForNodes(targetPodList apiv1.PodList, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (map[string]*targetsDetails, error) {
targets := make(map[string]*targetsDetails)
targetContainer := experimentsDetails.TargetContainer
for _, pod := range targetPodList.Items {
serviceMesh, err := setDestIps(pod, experimentsDetails, clients)
if err != nil {
return targets, stacktrace.Propagate(err, "could not set destination ips")
}
if experimentsDetails.TargetContainer == "" {
targetContainer = pod.Spec.Containers[0].Name
}
td := target{
Name: pod.Name,
Namespace: pod.Namespace,
TargetContainer: targetContainer,
ServiceMesh: serviceMesh,
}
if targets[pod.Spec.NodeName] == nil {
targets[pod.Spec.NodeName] = &targetsDetails{
Target: []target{td},
}
} else {
targets[pod.Spec.NodeName].Target = append(targets[pod.Spec.NodeName].Target, td)
}
}
return targets, nil
}
func logExperimentFields(experimentsDetails *experimentTypes.ExperimentDetails) {
switch experimentsDetails.NetworkChaosType {
case "network-loss":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketLossPercentage": experimentsDetails.NetworkPacketLossPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-latency":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkLatency": strconv.Itoa(experimentsDetails.NetworkLatency),
"Jitter": experimentsDetails.Jitter,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-corruption":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketCorruptionPercentage": experimentsDetails.NetworkPacketCorruptionPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-duplication":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketDuplicationPercentage": experimentsDetails.NetworkPacketDuplicationPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-rate-limit":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkBandwidth": experimentsDetails.NetworkBandwidth,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
}
}

View File

@ -0,0 +1,29 @@
package rate
import (
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
// PodNetworkRateChaos contains the steps to prepare and inject chaos
func PodNetworkRateChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkRateLimit")
defer span.End()
args := fmt.Sprintf("tbf rate %s burst %s limit %s", experimentsDetails.NetworkBandwidth, experimentsDetails.Burst, experimentsDetails.Limit)
if experimentsDetails.PeakRate != "" {
args = fmt.Sprintf("%s peakrate %s", args, experimentsDetails.PeakRate)
}
if experimentsDetails.MinBurst != "" {
args = fmt.Sprintf("%s mtu %s", args, experimentsDetails.MinBurst)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,76 +0,0 @@
package cri
import (
"encoding/json"
"fmt"
"os/exec"
"strings"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/pkg/errors"
coreV1 "k8s.io/api/core/v1"
)
// PIDFromContainer extract out the pids from the target containers
func PIDFromContainer(c coreV1.ContainerStatus) (int, error) {
containerID := strings.Split(c.ContainerID, "//")[1]
out, err := exec.Command("crictl", "inspect", containerID).CombinedOutput()
if err != nil {
log.Error(fmt.Sprintf("[cri] Failed to run crictl: %s", string(out)))
return 0, err
}
runtime := strings.Split(c.ContainerID, "://")[0]
PID, _ := parsePIDFromJSON(out, runtime)
if err != nil {
log.Error(fmt.Sprintf("[cri] Failed to parse json from crictl output: %s", string(out)))
return 0, err
}
log.Info(fmt.Sprintf("[cri] Container ID=%s has process PID=%d", containerID, PID))
return PID, nil
}
// InspectResponse JSON representation of crictl inspect command output
// in crio, pid is present inside pid attribute of inspect output
// in containerd, pid is present inside `info.pid` of inspect output
type InspectResponse struct {
Info InfoDetails `json:"info"`
}
// InfoDetails JSON representation of crictl inspect command output
// in crio, pid is present inside pid attribute of inspect output
// in containerd, pid is present inside `info.pid` of inspect output
type InfoDetails struct {
PID int `json:"pid"`
}
//parsePIDFromJSON extract the pid from the json output
func parsePIDFromJSON(j []byte, runtime string) (int, error) {
var pid int
// in crio, pid is present inside pid attribute of inspect output
// in containerd, pid is present inside `info.pid` of inspect output
if runtime == "containerd" {
var resp InspectResponse
if err := json.Unmarshal(j, &resp); err != nil {
return 0, err
}
pid = resp.Info.PID
} else if runtime == "crio" {
var resp InfoDetails
if err := json.Unmarshal(j, &resp); err != nil {
return 0, errors.Errorf("[cri] Could not find pid field in json: %s", string(j))
}
pid = resp.PID
} else {
return 0, errors.Errorf("no supported container runtime, runtime: %v", runtime)
}
if pid == 0 {
return 0, errors.Errorf("[cri] no running target container found, pid: %d", pid)
}
return pid, nil
}

View File

@ -1,78 +0,0 @@
package cri
import "testing"
func TestCrictl(t *testing.T) {
// crictl inspect output
json := `
{
"status": {
"id": "c739a31ab698e6e1c679442a538d16cc7199703c80f030e159b5de6b46e60518",
"metadata": {
"attempt": 0,
"name": "nginx-unprivileged"
},
"state": "CONTAINER_RUNNING",
"createdAt": "2020-07-28T16:50:35.84027013Z",
"startedAt": "2020-07-28T16:50:35.996159402Z",
"finishedAt": "1970-01-01T00:00:00Z",
"exitCode": 0,
"image": {
"image": "docker.io/nginxinc/nginx-unprivileged:latest"
},
"imageRef": "docker.io/nginxinc/nginx-unprivileged@sha256:0fd19475c17fff38191ef0dd3d1b949a25fd637cd64756146cc99363e580cf3a",
"reason": "",
"message": "",
"labels": {
"io.kubernetes.container.name": "nginx-unprivileged",
"io.kubernetes.pod.name": "app-7f99cf5459-gdqw7",
"io.kubernetes.pod.namespace": "myteam",
"io.kubernetes.pod.uid": "d2368c41-679f-40a8-aa5d-6a763876ef06"
},
"annotations": {
"io.kubernetes.container.hash": "ddf9b623",
"io.kubernetes.container.restartCount": "0",
"io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
"io.kubernetes.container.terminationMessagePolicy": "File",
"io.kubernetes.pod.terminationGracePeriod": "30"
},
"mounts": [
{
"containerPath": "/etc/hosts",
"hostPath": "/var/lib/kubelet/pods/d2368c41-679f-40a8-aa5d-6a763876ef06/etc-hosts",
"propagation": "PROPAGATION_PRIVATE",
"readonly": false,
"selinuxRelabel": false
},
{
"containerPath": "/dev/termination-log",
"hostPath": "/var/lib/kubelet/pods/d2368c41-679f-40a8-aa5d-6a763876ef06/containers/nginx-unprivileged/00467287",
"propagation": "PROPAGATION_PRIVATE",
"readonly": false,
"selinuxRelabel": false
},
{
"containerPath": "/var/run/secrets/kubernetes.io/serviceaccount",
"hostPath": "/var/lib/kubelet/pods/d2368c41-679f-40a8-aa5d-6a763876ef06/volumes/kubernetes.io~secret/default-token-8lf4k",
"propagation": "PROPAGATION_PRIVATE",
"readonly": true,
"selinuxRelabel": false
}
],
"logPath": "/var/log/pods/myteam_app-7f99cf5459-gdqw7_d2368c41-679f-40a8-aa5d-6a763876ef06/nginx-unprivileged/0.log"
},
"pid": 72496,
"sandboxId": "e978d37294a29c4a7f3f668f44f33431d4b9b892e415fcddfcdf71a8d047a2f7"
}`
expectedPID := 72496
PID, err := parsePIDFromJSON([]byte(json), "crio")
if err != nil {
t.Fatalf("Fail to parse json: %s", err)
}
if PID != expectedPID {
t.Errorf("Fail to parse PID from json. Expected %d, got %d", expectedPID, PID)
}
}

View File

@ -1,66 +0,0 @@
package ip
import (
"encoding/json"
"fmt"
"os/exec"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/pkg/errors"
)
// InterfaceName returns the name of the ethernet interface of the given
// process (container). It returns an error if case none, or more than one,
// interface is present.
func InterfaceName(PID int) (string, error) {
ip := fmt.Sprintf("sudo nsenter -t %d -n ip -json link list", PID)
cmd := exec.Command("/bin/bash", "-c", ip)
out, err := cmd.CombinedOutput()
log.Info(fmt.Sprintf("[ip] %s", cmd))
if err != nil {
log.Error(fmt.Sprintf("[ip]: Failed to run ip command: %s", string(out)))
return "", err
}
links, err := parseLinksResponse(out)
if err != nil {
log.Errorf("[ip] Failed to parse json response from ip command, err: %v", err)
return "", err
}
ls := []Link{}
for _, iface := range links {
if iface.Type != "loopback" {
ls = append(ls, iface)
}
}
log.Info(fmt.Sprintf("[ip] Found %d link interface(s): %+v", len(ls), ls))
if len(ls) > 1 {
errors.Errorf("[ip] Unexpected number of link interfaces for process %d. Expected 1 ethernet link, found %d",
PID, len(ls))
}
return ls[0].Name, nil
}
type LinkListResponse struct {
Links []Link
}
type Link struct {
Name string `json:"ifname"`
Type string `json:"link_type"`
Qdisc string `json:"qdisc"`
NSID int `json:"link_netnsid"`
}
func parseLinksResponse(j []byte) ([]Link, error) {
var links []Link
err := json.Unmarshal(j, &links)
if err != nil {
return nil, err
}
return links, nil
}

View File

@ -1,58 +0,0 @@
package ip
import "testing"
func TestIpLinkList(t *testing.T) {
json := `
[
{
"ifindex":1,
"ifname":"lo",
"flags":[
"LOOPBACK",
"UP",
"LOWER_UP"
],
"mtu":65536,
"qdisc":"noqueue",
"operstate":"UNKNOWN",
"linkmode":"DEFAULT",
"group":"default",
"txqlen":1000,
"link_type":"loopback",
"address":"00:00:00:00:00:00",
"broadcast":"00:00:00:00:00:00"
},
{
"ifindex":3,
"link_index":27,
"ifname":"eth0",
"flags": [
"BROADCAST",
"MULTICAST",
"UP",
"LOWER_UP"
],
"mtu":1450,
"qdisc":"noqueue",
"operstate":"UP",
"linkmode":"DEFAULT",
"group":"default",
"link_type":"ether",
"address":"0a:58:0a:80:00:0f",
"broadcast":"ff:ff:ff:ff:ff:ff",
"link_netnsid":0
}
]
`
links, err := parseLinksResponse([]byte(json))
if err != nil {
t.Fatalf("Failed to parse ip link json: %s", err)
}
expected := 2
got := len(links)
if got != expected {
t.Errorf("Failed to parse ip link json. Expected %d, got %d: %v", expected, got, links)
}
}

View File

@ -1,150 +0,0 @@
package network_latency
import (
"fmt"
. "fmt"
"os"
"os/signal"
"strconv"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/chaoslib/litmus/network_latency/cri"
"github.com/litmuschaos/litmus-go/chaoslib/litmus/network_latency/tc"
"github.com/litmuschaos/litmus-go/pkg/clients"
env "github.com/litmuschaos/litmus-go/pkg/generic/network-latency/environment"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-latency/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareNetwork function orchestrates the experiment
func PrepareNetwork(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails) error {
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
log.Info("[Chaos Target]: Fetching dependency info")
deps, err := env.Dependencies()
if err != nil {
Println(err.Error())
return err
}
log.Info("[Chaos Target]: Resolving latency target IPs")
conf, err := env.Resolver(deps)
if err != nil {
Println(err.Error())
return err
}
log.Info("[Chaos Target]: Finding the container PID")
targetPIDs, err := ChaosTargetPID(experimentsDetails.ChaosNode, experimentsDetails.AppNS, experimentsDetails.AppLabel, clients)
if err != nil {
Println(err.Error())
return err
}
for _, targetPID := range targetPIDs {
log.Info(fmt.Sprintf("[Chaos]: Apply latency to process PID=%d", targetPID))
err := tc.CreateDelayQdisc(targetPID, experimentsDetails.Latency, experimentsDetails.Jitter)
if err != nil {
log.Error("Failed to create delay, aborting experiment")
return err
}
for i, ip := range conf.IP {
port := conf.Port[i]
err = tc.AddIPFilter(targetPID, ip, port)
if err != nil {
Println(err.Error())
return err
}
}
}
log.Infof("[Chaos]: Waiting for %vs", strconv.Itoa(experimentsDetails.ChaosDuration))
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
loop:
for {
endTime = time.After(timeDelay)
select {
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
for _, targetPID := range targetPIDs {
err = tc.Killnetem(targetPID)
if err != nil {
Println(err.Error())
}
}
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
log.Info("[Chaos]: Stopping the experiment")
for _, targetPID := range targetPIDs {
err = tc.Killnetem(targetPID)
if err != nil {
Println(err.Error())
}
}
if err != nil {
Println(err.Error())
return err
}
return nil
}
//ChaosTargetPID finds the target app PIDs
func ChaosTargetPID(chaosNode string, appNs string, appLabel string, clients clients.ClientSets) ([]int, error) {
podList, err := clients.KubeClient.CoreV1().Pods(appNs).List(metav1.ListOptions{
LabelSelector: appLabel,
FieldSelector: "spec.nodeName=" + chaosNode,
})
if err != nil {
return []int{}, err
}
if len(podList.Items) == 0 {
return []int{}, errors.Errorf("No pods with label %s were found in the namespace %s", appLabel, appNs)
}
PIDs := []int{}
for _, pod := range podList.Items {
if len(pod.Status.ContainerStatuses) == 0 {
return []int{}, errors.Errorf("Unreachable: No containers running in this pod: %+v", pod)
}
// containers in a pod share the network namespace, so anyone should be
// fine for our purposes
container := pod.Status.ContainerStatuses[0]
log.InfoWithValues("Found target container", logrus.Fields{
"container": container.Name,
"Pod": pod.Name,
"Status": pod.Status.Phase,
"containerID": container.ContainerID,
})
PID, err := cri.PIDFromContainer(container)
if err != nil {
return []int{}, err
}
PIDs = append(PIDs, PID)
}
log.Info(Sprintf("Found %d target process(es)", len(PIDs)))
return PIDs, nil
}

View File

@ -1,107 +0,0 @@
package tc
import (
"errors"
"fmt"
"net"
"os/exec"
"github.com/litmuschaos/litmus-go/chaoslib/litmus/network_latency/ip"
"github.com/litmuschaos/litmus-go/pkg/log"
)
func CreateDelayQdisc(PID int, latency float64, jitter float64) error {
if PID == 0 {
log.Error(fmt.Sprintf("[tc] Invalid PID=%d", PID))
return errors.New("Target PID cannot be zero")
}
if latency <= 0 {
log.Error(fmt.Sprintf("[tc] Invalid latency=%f", latency))
return errors.New("Latency should be a positive value")
}
iface, err := ip.InterfaceName(PID)
if err != nil {
return err
}
log.Info(fmt.Sprintf("[tc] CreateDelayQdisc: PID=%d interface=%s latency=%fs jitter=%fs", PID, iface, latency, jitter))
tc := fmt.Sprintf("sudo nsenter -t %d -n tc qdisc replace dev %s root handle 1: prio", PID, iface)
cmd := exec.Command("/bin/bash", "-c", tc)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
if almostZero(jitter) {
// no jitter
tc = fmt.Sprintf("sudo nsenter -t %d -n tc qdisc replace dev %s parent 1:3 netem delay %fs", PID, iface, latency)
} else {
tc = fmt.Sprintf("sudo nsenter -t %d -n tc qdisc replace dev %s parent 1:3 netem delay %fs %fs", PID, iface, latency, jitter)
}
cmd = exec.Command("/bin/bash", "-c", tc)
out, err = cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
return nil
}
func AddIPFilter(PID int, IP net.IP, port int) error {
if PID == 0 {
log.Error(fmt.Sprintf("[tc] Invalid PID=%d", PID))
return errors.New("Target PID cannot be zero")
}
if port == 0 {
log.Error(fmt.Sprintf("[tc] Invalid Port=%d", port))
return errors.New("Port cannot be zero")
}
log.Info(fmt.Sprintf("[tc] AddIPFilter: Target PID=%d, destination IP=%s, destination Port=%d", PID, IP, port))
tc := fmt.Sprintf("sudo nsenter -t %d -n tc filter add dev eth0 protocol ip parent 1:0 prio 3 u32 match ip dst %s match ip dport %d 0xffff flowid 1:3", PID, IP, port)
cmd := exec.Command("/bin/bash", "-c", tc)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
return nil
}
func Killnetem(PID int) error {
if PID == 0 {
log.Error(fmt.Sprintf("[tc] Invalid PID=%d", PID))
return errors.New("Target PID cannot be zero")
}
tc := fmt.Sprintf("sudo nsenter -t %d -n tc qdisc delete dev eth0 root", PID)
cmd := exec.Command("/bin/bash", "-c", tc)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
return nil
}
// float is complicated ¯\_(ツ)_/¯ it can't be compared to exact numbers due to
// variations in precision
func almostZero(f float64) bool {
return f < 0.0000001
}

View File

@ -1,25 +1,44 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-cpu-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var err error
// PrepareNodeCPUHog contains preparation steps before chaos injection
func PrepareNodeCPUHog(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeCPUHogFault")
defer span.End()
// PrepareNodeCPUHog contains prepration steps before chaos injection
func PrepareNodeCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//set up the tunables if provided in range
setChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Node CPU Cores": experimentsDetails.NodeCPUcores,
"CPU Load": experimentsDetails.CPULoad,
"Node Affected Percentage": experimentsDetails.NodesAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -28,36 +47,34 @@ func PrepareNodeCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, cl
}
//Select node for node-cpu-hog
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodesAffectedPerc, clients)
nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc)
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node list")
}
log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{
"No. Of Nodes": len(targetNodeList),
"Node Names": targetNodeList,
})
if experimentsDetails.EngineName != "" {
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// Get Resource Requirements
experimentsDetails.Resources, err = common.GetChaosPodResourceRequirements(experimentsDetails.ChaosPodName, experimentsDetails.ExperimentName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("Unable to get resource requirements, err: %v", err)
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
} else {
if err = InjectChaosInParallelMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -68,11 +85,19 @@ func PrepareNodeCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, cl
return nil
}
// InjectChaosInSerialMode stress the cpu of all the target nodes serially (one by one)
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInSerialMode stress the cpu of all the target nodes serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeCPUHogFaultInSerialMode")
defer span.End()
nodeCPUCores := experimentsDetails.NodeCPUcores
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
for _, appNode := range targetNodeList {
@ -83,68 +108,68 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// When number of cpu cores for hogging is not defined , it will take it from node capacity
if nodeCPUCores == 0 {
err = SetCPUCapacity(experimentsDetails, appNode, clients)
if err != nil {
return err
if nodeCPUCores == "0" {
if err := setCPUCapacity(experimentsDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not get node cpu capacity")
}
}
log.InfoWithValues("[Info]: Details of Node under chaos injection", logrus.Fields{
"NodeName": appNode,
"NodeCPUcores": experimentsDetails.NodeCPUcores,
"NodeCPUCores": experimentsDetails.NodeCPUcores,
})
experimentsDetails.RunID = common.GetRunID()
experimentsDetails.RunID = stringutils.GetRunID()
// Creating the helper pod to perform node cpu hog
err = CreateHelperPod(experimentsDetails, appNode, clients, labelSuffix)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
common.SetTargets(appNode, "targeted", "node", chaosDetails)
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", experimentsDetails.ChaosDuration+30)
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+30, experimentsDetails.ExperimentName)
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
}
// Checking the status of target nodes
log.Info("[Status]: Getting the status of target nodes")
err = status.CheckNodeStatus(appNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
log.Warnf("Target nodes are not in the ready state, you may need to manually recover the node, err: %v", err)
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, false)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
}
}
return nil
}
// InjectChaosInParallelMode stress the cpu of all the target nodes in parallel mode (all at once)
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInParallelMode stress the cpu of all the target nodes in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeCPUHogFaultInParallelMode")
defer span.End()
nodeCPUCores := experimentsDetails.NodeCPUcores
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
experimentsDetails.RunID = stringutils.GetRunID()
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
@ -154,10 +179,9 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
}
// When number of cpu cores for hogging is not defined , it will take it from node capacity
if nodeCPUCores == 0 {
err = SetCPUCapacity(experimentsDetails, appNode, clients)
if err != nil {
return err
if nodeCPUCores == "0" {
if err := setCPUCapacity(experimentsDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not get node cpu capacity")
}
}
@ -166,87 +190,50 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"NodeCPUcores": experimentsDetails.NodeCPUcores,
})
experimentsDetails.RunID = common.GetRunID()
// Creating the helper pod to perform node cpu hog
err = CreateHelperPod(experimentsDetails, appNode, clients, labelSuffix)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pods")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", experimentsDetails.ChaosDuration+30)
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+30, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
}
for _, appNode := range targetNodeList {
// Checking the status of application node
log.Info("[Status]: Getting the status of application node")
err = status.CheckNodeStatus(appNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
log.Warn("Application node is not in the ready state, you may need to manually recover the node")
}
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
return nil
}
//SetCPUCapacity fetch the node cpu capacity
func SetCPUCapacity(experimentsDetails *experimentTypes.ExperimentDetails, appNode string, clients clients.ClientSets) error {
node, err := clients.KubeClient.CoreV1().Nodes().Get(appNode, v1.GetOptions{})
if err != nil {
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
cpuCapacity, _ := node.Status.Capacity.Cpu().AsInt64()
experimentsDetails.NodeCPUcores = int(cpuCapacity)
return nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, appNode string, clients clients.ClientSets, labelSuffix string) error {
// setCPUCapacity fetch the node cpu capacity
func setCPUCapacity(experimentsDetails *experimentTypes.ExperimentDetails, appNode string, clients clients.ClientSets) error {
node, err := clients.GetNode(appNode, experimentsDetails.Timeout, experimentsDetails.Delay)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNode), Reason: err.Error()}
}
experimentsDetails.NodeCPUcores = node.Status.Capacity.Cpu().String()
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeCPUHogFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper-" + labelSuffix,
"name": experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
},
Annotations: experimentsDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNode,
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNode,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
@ -257,16 +244,35 @@ func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, appN
},
Args: []string{
"--cpu",
strconv.Itoa(experimentsDetails.NodeCPUcores),
experimentsDetails.NodeCPUcores,
"--cpu-load",
experimentsDetails.CPULoad,
"--timeout",
strconv.Itoa(experimentsDetails.ChaosDuration),
},
Resources: experimentsDetails.Resources,
Resources: chaosDetails.Resources,
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// setChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.NodeCPUcores = common.ValidateRange(experimentsDetails.NodeCPUcores)
experimentsDetails.CPULoad = common.ValidateRange(experimentsDetails.CPULoad)
experimentsDetails.NodesAffectedPerc = common.ValidateRange(experimentsDetails.NodesAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -1,32 +1,53 @@
package lib
import (
"bytes"
"context"
"fmt"
"os"
"os/exec"
"os/signal"
"strconv"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-drain/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
apierrors "k8s.io/apimachinery/pkg/api/errors"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var err error
var (
err error
inject, abort chan os.Signal
)
//PrepareNodeDrain contains the prepration steps before chaos injection
func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareNodeDrain contains the preparation steps before chaos injection
func PrepareNodeDrain(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeDrainFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -36,9 +57,9 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
if experimentsDetails.TargetNode == "" {
//Select node for kubelet-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, clients)
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node name")
}
}
@ -48,81 +69,56 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, clients, resultDetails, chaosDetails, eventsDetails)
// Drain the application node
err := DrainNode(experimentsDetails, clients)
if err != nil {
return err
if err := drainNode(ctx, experimentsDetails, clients, chaosDetails); err != nil {
log.Info("[Revert]: Reverting chaos because error during draining of node")
if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(uncordonErr).Error())}
}
return stacktrace.Propagate(err, "could not drain node")
}
// Verify the status of AUT after reschedule
log.Info("[Status]: Verify the status of AUT after reschedule")
err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("Application status check failed, err: %v", err)
if err = status.AUTStatusCheck(clients, chaosDetails); err != nil {
log.Info("[Revert]: Reverting chaos because application status check failed")
if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(uncordonErr).Error())}
}
return err
}
// Verify the status of Auxiliary Applications after reschedule
if experimentsDetails.AuxiliaryAppInfo != "" {
log.Info("[Status]: Verify that the Auxiliary Applications are running")
err = status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("Auxiliary Applications status check failed, err: %v", err)
if err = status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
log.Info("[Revert]: Reverting chaos because auxiliary application status check failed")
if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(uncordonErr).Error())}
}
return err
}
}
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
common.WaitForDuration(experimentsDetails.ChaosDuration)
loop:
for {
endTime = time.After(timeDelay)
select {
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
// updating the chaosresult after stopped
failStep := "Node Drain injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
// generating summary event in chaosengine
msg := experimentsDetails.ExperimentName + " experiment has been aborted"
types.SetEngineEventAttributes(eventsDetails, types.Summary, msg, "Warning", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
// generating summary event in chaosresult
types.SetResultEventAttributes(eventsDetails, types.StoppedVerdict, msg, "Warning", resultDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosResult")
if err = UncordonNode(experimentsDetails, clients); err != nil {
log.Errorf("unable to uncordon node, err :%v", err)
}
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
log.Info("[Chaos]: Stopping the experiment")
// Uncordon the application node
err = UncordonNode(experimentsDetails, clients)
if err != nil {
return err
}
// Checking the status of target nodes
log.Info("[Status]: Getting the status of target nodes")
err = status.CheckNodeStatus(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
log.Warnf("Target nodes are not in the ready state, you may need to manually recover the node, err: %v", err)
if err := uncordonNode(experimentsDetails, clients, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not uncordon the target node")
}
//Waiting for the ramp time after chaos injection
@ -133,60 +129,106 @@ loop:
return nil
}
// DrainNode drain the application node
func DrainNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
// drainNode drain the target node
func drainNode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeDrainFault")
defer span.End()
log.Infof("[Inject]: Draining the %v node", experimentsDetails.TargetNode)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
log.Infof("[Inject]: Draining the %v node", experimentsDetails.TargetNode)
command := exec.Command("kubectl", "drain", experimentsDetails.TargetNode, "--ignore-daemonsets", "--delete-local-data", "--force", "--timeout", strconv.Itoa(experimentsDetails.ChaosDuration)+"s")
var out, stderr bytes.Buffer
command.Stdout = &out
command.Stderr = &stderr
if err := command.Run(); err != nil {
log.Infof("Error String: %v", stderr.String())
return fmt.Errorf("Unable to drain the %v node, err: %v", experimentsDetails.TargetNode, err)
command := exec.Command("kubectl", "drain", experimentsDetails.TargetNode, "--ignore-daemonsets", "--delete-emptydir-data", "--force", "--timeout", strconv.Itoa(experimentsDetails.ChaosDuration)+"s")
if err := common.RunCLICommands(command, "", fmt.Sprintf("{node: %s}", experimentsDetails.TargetNode), "failed to drain the target node", cerrors.ErrorTypeChaosInject); err != nil {
return err
}
common.SetTargets(experimentsDetails.TargetNode, "injected", "node", chaosDetails)
return retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
nodeSpec, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), experimentsDetails.TargetNode, v1.GetOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{node: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
}
if !nodeSpec.Spec.Unschedulable {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{node: %s}", experimentsDetails.TargetNode), Reason: "node is not in unschedule state"}
}
return nil
})
}
return nil
}
// uncordonNode uncordon the application node
func uncordonNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
targetNodes := strings.Split(experimentsDetails.TargetNode, ",")
for _, targetNode := range targetNodes {
//Check node exist before uncordon the node
_, err := clients.GetNode(targetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil {
if apierrors.IsNotFound(err) {
log.Infof("[Info]: The %v node is no longer exist, skip uncordon the node", targetNode)
common.SetTargets(targetNode, "noLongerExist", "node", chaosDetails)
continue
} else {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{node: %s}", targetNode), Reason: err.Error()}
}
}
log.Infof("[Recover]: Uncordon the %v node", targetNode)
command := exec.Command("kubectl", "uncordon", targetNode)
if err := common.RunCLICommands(command, "", fmt.Sprintf("{node: %s}", targetNode), "failed to uncordon the target node", cerrors.ErrorTypeChaosInject); err != nil {
return err
}
common.SetTargets(targetNode, "reverted", "node", chaosDetails)
}
return retry.
Times(90).
Wait(1 * time.Second).
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
nodeSpec, err := clients.KubeClient.CoreV1().Nodes().Get(experimentsDetails.TargetNode, v1.GetOptions{})
if err != nil {
return err
}
if !nodeSpec.Spec.Unschedulable {
return errors.Errorf("%v node is not in unschedulable state", experimentsDetails.TargetNode)
targetNodes := strings.Split(experimentsDetails.TargetNode, ",")
for _, targetNode := range targetNodes {
nodeSpec, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), targetNode, v1.GetOptions{})
if err != nil {
if apierrors.IsNotFound(err) {
continue
} else {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{node: %s}", targetNode), Reason: err.Error()}
}
}
if nodeSpec.Spec.Unschedulable {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{node: %s}", targetNode), Reason: "target node is in unschedule state"}
}
}
return nil
})
}
// UncordonNode uncordon the application node
func UncordonNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
// abortWatcher continuously watch for the abort signals
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails) {
// waiting till the abort signal received
<-abort
log.Infof("[Recover]: Uncordon the %v node", experimentsDetails.TargetNode)
command := exec.Command("kubectl", "uncordon", experimentsDetails.TargetNode)
var out, stderr bytes.Buffer
command.Stdout = &out
command.Stderr = &stderr
if err := command.Run(); err != nil {
log.Infof("Error String: %v", stderr.String())
return fmt.Errorf("Unable to uncordon the %v node, err: %v", experimentsDetails.TargetNode, err)
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
if err := uncordonNode(experimentsDetails, clients, chaosDetails); err != nil {
log.Errorf("Unable to uncordon the node, err: %v", err)
}
retry--
time.Sleep(1 * time.Second)
}
return retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
nodeSpec, err := clients.KubeClient.CoreV1().Nodes().Get(experimentsDetails.TargetNode, v1.GetOptions{})
if err != nil {
return err
}
if nodeSpec.Spec.Unschedulable {
return errors.Errorf("%v node is in unschedulable state", experimentsDetails.TargetNode)
}
return nil
})
log.Info("Chaos Revert Completed")
os.Exit(0)
}

View File

@ -1,25 +1,45 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-io-stress/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var err error
// PrepareNodeIOStress contains preparation steps before chaos injection
func PrepareNodeIOStress(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeIOStressFault")
defer span.End()
//set up the tunables if provided in range
setChaosTunables(experimentsDetails)
// PrepareNodeIOStress contains prepration steps before chaos injection
func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
log.InfoWithValues("[Info]: The details of chaos tunables are:", logrus.Fields{
"FilesystemUtilizationBytes": experimentsDetails.FilesystemUtilizationBytes,
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"CPU Core": experimentsDetails.CPU,
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
"Node Affected Percentage": experimentsDetails.NodesAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -28,9 +48,10 @@ func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails,
}
//Select node for node-io-stress
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodesAffectedPerc, clients)
nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc)
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node list")
}
log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{
"No. Of Nodes": len(targetNodeList),
@ -38,26 +59,22 @@ func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails,
})
if experimentsDetails.EngineName != "" {
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// Get Resource Requirements
experimentsDetails.Resources, err = common.GetChaosPodResourceRequirements(experimentsDetails.ChaosPodName, experimentsDetails.ExperimentName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("Unable to get resource requirements, err: %v", err)
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
} else {
if err = InjectChaosInParallelMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -68,10 +85,18 @@ func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails,
return nil
}
// InjectChaosInSerialMode stress the io of all the target nodes serially (one by one)
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInSerialMode stress the io of all the target nodes serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeIOStressFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
labelSuffix := common.GetRunID()
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
@ -86,55 +111,45 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
})
experimentsDetails.RunID = common.GetRunID()
experimentsDetails.RunID = stringutils.GetRunID()
// Creating the helper pod to perform node io stress
err = CreateHelperPod(experimentsDetails, appNode, clients, labelSuffix)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", experimentsDetails.ChaosDuration+30)
common.SetTargets(appNode, "targeted", "node", chaosDetails)
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+30, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
}
// Checking the status of target nodes
log.Info("[Status]: Getting the status of target nodes")
err = status.CheckNodeStatus(appNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
log.Warnf("Target nodes are not in the ready state, you may need to manually recover the node, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
}
return nil
}
// InjectChaosInParallelMode stress the io of all the target nodes in parallel mode (all at once)
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInParallelMode stress the io of all the target nodes in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeIOStressFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
experimentsDetails.RunID = stringutils.GetRunID()
labelSuffix := common.GetRunID()
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
@ -149,72 +164,43 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
})
experimentsDetails.RunID = common.GetRunID()
// Creating the helper pod to perform node io stress
err = CreateHelperPod(experimentsDetails, appNode, clients, labelSuffix)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", experimentsDetails.ChaosDuration+30)
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+30, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
for _, appNode := range targetNodeList {
// Checking the status of application node
log.Info("[Status]: Getting the status of application node")
err = status.CheckNodeStatus(appNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
log.Warn("Application node is not in the ready state, you may need to manually recover the node")
}
common.SetTargets(appNode, "targeted", "node", chaosDetails)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
return nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, appNode string, clients clients.ClientSets, labelSuffix string) error {
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeIOStressFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper-" + labelSuffix,
"name": experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
Annotations: experimentsDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNode,
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNode,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
@ -223,42 +209,54 @@ func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, appN
Command: []string{
"stress-ng",
},
Args: GetContainerArguments(experimentsDetails),
Resources: experimentsDetails.Resources,
Args: getContainerArguments(experimentsDetails),
Resources: chaosDetails.Resources,
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// GetContainerArguments derives the args for the pumba stress helper pod
func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) []string {
// getContainerArguments derives the args for the pumba stress helper pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) []string {
var hddbytes string
if experimentsDetails.FilesystemUtilizationBytes == 0 {
if experimentsDetails.FilesystemUtilizationPercentage == 0 {
if experimentsDetails.FilesystemUtilizationBytes == "0" {
if experimentsDetails.FilesystemUtilizationPercentage == "0" {
hddbytes = "10%"
log.Info("Neither of FilesystemUtilizationPercentage or FilesystemUtilizationBytes provided, proceeding with a default FilesystemUtilizationPercentage value of 10%")
} else {
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationPercentage) + "%"
hddbytes = experimentsDetails.FilesystemUtilizationPercentage + "%"
}
} else {
if experimentsDetails.FilesystemUtilizationPercentage == 0 {
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationBytes) + "G"
if experimentsDetails.FilesystemUtilizationPercentage == "0" {
hddbytes = experimentsDetails.FilesystemUtilizationBytes + "G"
} else {
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationPercentage) + "%"
hddbytes = experimentsDetails.FilesystemUtilizationPercentage + "%"
log.Warn("Both FsUtilPercentage & FsUtilBytes provided as inputs, using the FsUtilPercentage value to proceed with stress exp")
}
}
stressArgs := []string{
"--cpu",
experimentsDetails.CPU,
"--vm",
experimentsDetails.VMWorkers,
"--io",
strconv.Itoa(experimentsDetails.NumberOfWorkers),
experimentsDetails.NumberOfWorkers,
"--hdd",
strconv.Itoa(experimentsDetails.NumberOfWorkers),
experimentsDetails.NumberOfWorkers,
"--hdd-bytes",
hddbytes,
"--timeout",
@ -268,3 +266,15 @@ func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails
}
return stressArgs
}
// setChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.FilesystemUtilizationBytes = common.ValidateRange(experimentsDetails.FilesystemUtilizationBytes)
experimentsDetails.FilesystemUtilizationPercentage = common.ValidateRange(experimentsDetails.FilesystemUtilizationPercentage)
experimentsDetails.CPU = common.ValidateRange(experimentsDetails.CPU)
experimentsDetails.VMWorkers = common.ValidateRange(experimentsDetails.VMWorkers)
experimentsDetails.NumberOfWorkers = common.ValidateRange(experimentsDetails.NumberOfWorkers)
experimentsDetails.NodesAffectedPerc = common.ValidateRange(experimentsDetails.NodesAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -1,23 +1,44 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-memory-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareNodeMemoryHog contains prepration steps before chaos injection
func PrepareNodeMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareNodeMemoryHog contains preparation steps before chaos injection
func PrepareNodeMemoryHog(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeMemoryHogFault")
defer span.End()
//set up the tunables if provided in range
setChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The details of chaos tunables are:", logrus.Fields{
"MemoryConsumptionMebibytes": experimentsDetails.MemoryConsumptionMebibytes,
"MemoryConsumptionPercentage": experimentsDetails.MemoryConsumptionPercentage,
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
"Node Affected Percentage": experimentsDetails.NodesAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -26,36 +47,34 @@ func PrepareNodeMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails,
}
//Select node for node-memory-hog
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodesAffectedPerc, clients)
nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc)
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node list")
}
log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{
"No. Of Nodes": len(targetNodeList),
"Node Names": targetNodeList,
})
if experimentsDetails.EngineName != "" {
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// Get Resource Requirements
experimentsDetails.Resources, err = common.GetChaosPodResourceRequirements(experimentsDetails.ChaosPodName, experimentsDetails.ExperimentName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("Unable to get resource requirements, err: %v", err)
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
} else {
if err = InjectChaosInParallelMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -66,10 +85,18 @@ func PrepareNodeMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails,
return nil
}
// InjectChaosInSerialMode stress the memory of all the target nodes serially (one by one)
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInSerialMode stress the memory of all the target nodes serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeMemoryHogFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
labelSuffix := common.GetRunID()
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
@ -84,69 +111,50 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"Memory Consumption Mebibytes": experimentsDetails.MemoryConsumptionMebibytes,
})
experimentsDetails.RunID = common.GetRunID()
experimentsDetails.RunID = stringutils.GetRunID()
//Getting node memory details
memoryCapacity, memoryAllocatable, err := GetNodeMemoryDetails(appNode, clients)
memoryCapacity, memoryAllocatable, err := getNodeMemoryDetails(appNode, clients)
if err != nil {
return errors.Errorf("Unable to get the node memory details, err: %v", err)
return stacktrace.Propagate(err, "could not get node memory details")
}
//Getting the exact memory value to exhaust
MemoryConsumption, err := CalculateMemoryConsumption(experimentsDetails, clients, memoryCapacity, memoryAllocatable)
MemoryConsumption, err := calculateMemoryConsumption(experimentsDetails, memoryCapacity, memoryAllocatable)
if err != nil {
return errors.Errorf("memory calculation failed, err: %v", err)
return stacktrace.Propagate(err, "could not calculate memory consumption value")
}
// Creating the helper pod to perform node memory hog
err = CreateHelperPod(experimentsDetails, appNode, clients, labelSuffix, MemoryConsumption)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
if err = createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients, MemoryConsumption); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(appNode, "targeted", "node", chaosDetails)
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", experimentsDetails.ChaosDuration+30)
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+30, experimentsDetails.ExperimentName)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
} else if podStatus == "Failed" {
return errors.Errorf("helper pod status is %v", podStatus)
}
// Checking the status of target nodes
log.Info("[Status]: Getting the status of target nodes")
err = status.CheckNodeStatus(appNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
log.Warnf("Target nodes are not in the ready state, you may need to manually recover the node, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
}
return nil
}
// InjectChaosInParallelMode stress the memory all the target nodes in parallel mode (all at once)
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInParallelMode stress the memory all the target nodes in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeMemoryHogFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
experimentsDetails.RunID = stringutils.GetRunID()
labelSuffix := common.GetRunID()
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
@ -161,102 +169,69 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"Memory Consumption Mebibytes": experimentsDetails.MemoryConsumptionMebibytes,
})
experimentsDetails.RunID = common.GetRunID()
//Getting node memory details
memoryCapacity, memoryAllocatable, err := GetNodeMemoryDetails(appNode, clients)
memoryCapacity, memoryAllocatable, err := getNodeMemoryDetails(appNode, clients)
if err != nil {
return errors.Errorf("Unable to get the node memory details, err: %v", err)
return stacktrace.Propagate(err, "could not get node memory details")
}
//Getting the exact memory value to exhaust
MemoryConsumption, err := CalculateMemoryConsumption(experimentsDetails, clients, memoryCapacity, memoryAllocatable)
MemoryConsumption, err := calculateMemoryConsumption(experimentsDetails, memoryCapacity, memoryAllocatable)
if err != nil {
return errors.Errorf("memory calculation failed, err: %v", err)
return stacktrace.Propagate(err, "could not calculate memory consumption value")
}
// Creating the helper pod to perform node memory hog
err = CreateHelperPod(experimentsDetails, appNode, clients, labelSuffix, MemoryConsumption)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
if err = createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients, MemoryConsumption); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", experimentsDetails.ChaosDuration+30)
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+30, experimentsDetails.ExperimentName)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
} else if podStatus == "Failed" {
return errors.Errorf("helper pod status is %v", podStatus)
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
for _, appNode := range targetNodeList {
// Checking the status of application node
log.Info("[Status]: Getting the status of application node")
err = status.CheckNodeStatus(appNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
log.Warn("Application node is not in the ready state, you may need to manually recover the node")
}
common.SetTargets(appNode, "targeted", "node", chaosDetails)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
return nil
}
// GetNodeMemoryDetails will return the total memory capacity and memory allocatable of an application node
func GetNodeMemoryDetails(appNodeName string, clients clients.ClientSets) (int, int, error) {
nodeDetails, err := clients.KubeClient.CoreV1().Nodes().Get(appNodeName, v1.GetOptions{})
// getNodeMemoryDetails will return the total memory capacity and memory allocatable of an application node
func getNodeMemoryDetails(appNodeName string, clients clients.ClientSets) (int, int, error) {
nodeDetails, err := clients.GetNode(appNodeName, 180, 2)
if err != nil {
return 0, 0, err
return 0, 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNodeName), Reason: err.Error()}
}
memoryCapacity := int(nodeDetails.Status.Capacity.Memory().Value())
memoryAllocatable := int(nodeDetails.Status.Allocatable.Memory().Value())
if memoryCapacity == 0 || memoryAllocatable == 0 {
return memoryCapacity, memoryAllocatable, errors.Errorf("Failed to get memory details of the application node")
return memoryCapacity, memoryAllocatable, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNodeName), Reason: "failed to get memory details of the target node"}
}
return memoryCapacity, memoryAllocatable, nil
}
// CalculateMemoryConsumption will calculate the amount of memory to be consumed for a given unit.
func CalculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, memoryCapacity, memoryAllocatable int) (string, error) {
// calculateMemoryConsumption will calculate the amount of memory to be consumed for a given unit.
func calculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDetails, memoryCapacity, memoryAllocatable int) (string, error) {
var totalMemoryConsumption int
var MemoryConsumption string
var selector string
if experimentsDetails.MemoryConsumptionMebibytes == 0 {
if experimentsDetails.MemoryConsumptionPercentage == 0 {
log.Info("Neither of MemoryConsumptionPercentage or MemoryConsumptionMebibytes provided, proceeding with a default MemoryConsumptionPercentage value of 30%")
if experimentsDetails.MemoryConsumptionMebibytes == "0" {
if experimentsDetails.MemoryConsumptionPercentage == "0" {
log.Info("Neither of MemoryConsumptionPercentage or MemoryConsumptionMebibytes provided, proceeding with a default MemoryConsumptionPercentage value of 30%%")
return "30%", nil
}
selector = "percentage"
} else {
if experimentsDetails.MemoryConsumptionPercentage == 0 {
if experimentsDetails.MemoryConsumptionPercentage == "0" {
selector = "mebibytes"
} else {
log.Warn("Both MemoryConsumptionPercentage & MemoryConsumptionMebibytes provided as inputs, using the MemoryConsumptionPercentage value to proceed with the experiment")
@ -269,12 +244,13 @@ func CalculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDe
case "percentage":
//Getting the total memory under chaos
memoryForChaos := ((float64(experimentsDetails.MemoryConsumptionPercentage) / 100) * float64(memoryCapacity))
memoryConsumptionPercentage, _ := strconv.ParseFloat(experimentsDetails.MemoryConsumptionPercentage, 64)
memoryForChaos := (memoryConsumptionPercentage / 100) * float64(memoryCapacity)
//Get the percentage of memory under chaos wrt allocatable memory
totalMemoryConsumption = int((float64(memoryForChaos) / float64(memoryAllocatable)) * 100)
totalMemoryConsumption = int((memoryForChaos / float64(memoryAllocatable)) * 100)
if totalMemoryConsumption > 100 {
log.Infof("[Info]: PercentageOfMemoryCapacity To Be Used: %d percent, which is more than 100 percent (%d percent) of Allocatable Memory, so the experiment will only consume upto 100 percent of Allocatable Memory", experimentsDetails.MemoryConsumptionPercentage, totalMemoryConsumption)
log.Infof("[Info]: PercentageOfMemoryCapacity To Be Used: %v percent, which is more than 100 percent (%d percent) of Allocatable Memory, so the experiment will only consume upto 100 percent of Allocatable Memory", experimentsDetails.MemoryConsumptionPercentage, totalMemoryConsumption)
MemoryConsumption = "100%"
} else {
log.Infof("[Info]: PercentageOfMemoryCapacity To Be Used: %v percent, which is %d percent of Allocatable Memory", experimentsDetails.MemoryConsumptionPercentage, totalMemoryConsumption)
@ -286,7 +262,9 @@ func CalculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDe
// Bringing all the values in Ki unit to compare
// since 1Mi = 1025.390625Ki
TotalMemoryConsumption := float64(experimentsDetails.MemoryConsumptionMebibytes) * 1025.390625
memoryConsumptionMebibytes, _ := strconv.ParseFloat(experimentsDetails.MemoryConsumptionMebibytes, 64)
TotalMemoryConsumption := memoryConsumptionMebibytes * 1025.390625
// since 1Ki = 1024 bytes
memoryAllocatable := memoryAllocatable / 1024
@ -294,31 +272,32 @@ func CalculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDe
MemoryConsumption = strconv.Itoa(memoryAllocatable) + "k"
log.Infof("[Info]: The memory for consumption %vKi is more than the available memory %vKi, so the experiment will hog the memory upto %vKi", int(TotalMemoryConsumption), memoryAllocatable, memoryAllocatable)
} else {
MemoryConsumption = strconv.Itoa(experimentsDetails.MemoryConsumptionMebibytes) + "m"
MemoryConsumption = experimentsDetails.MemoryConsumptionMebibytes + "m"
}
return MemoryConsumption, nil
}
return "", errors.Errorf("please specify the memory consumption value either in percentage or mebibytes in a non-decimal format using respective envs")
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: "specify the memory consumption value either in percentage or mebibytes in a non-decimal format using respective envs"}
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, appNode string, clients clients.ClientSets, labelSuffix, MemoryConsumption string) error {
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets, MemoryConsumption string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeMemoryHogFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper-" + labelSuffix,
"name": experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
},
Annotations: experimentsDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNode,
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNode,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
@ -329,18 +308,36 @@ func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, appN
},
Args: []string{
"--vm",
"1",
experimentsDetails.NumberOfWorkers,
"--vm-bytes",
MemoryConsumption,
"--timeout",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
},
Resources: experimentsDetails.Resources,
Resources: chaosDetails.Resources,
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// setChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.MemoryConsumptionMebibytes = common.ValidateRange(experimentsDetails.MemoryConsumptionMebibytes)
experimentsDetails.MemoryConsumptionPercentage = common.ValidateRange(experimentsDetails.MemoryConsumptionPercentage)
experimentsDetails.NumberOfWorkers = common.ValidateRange(experimentsDetails.NumberOfWorkers)
experimentsDetails.NodesAffectedPerc = common.ValidateRange(experimentsDetails.NodesAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -1,24 +1,26 @@
package lib
import (
"context"
"fmt"
"math/rand"
"strconv"
"time"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-restart/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
k8stypes "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
schedulerapi "k8s.io/kubernetes/pkg/scheduler/api"
)
var err error
@ -32,21 +34,29 @@ const (
privateKeySecret string = "private-key-cm-"
emptyDirVolume string = "empty-dir-"
ObjectNameField = "metadata.name"
)
// PrepareNodeRestart contains preparation steps before chaos injection
func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareNodeRestart(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeRestartFault")
defer span.End()
//Select the node
if experimentsDetails.TargetNode == "" {
//Select node for node-restart
targetNode, err := GetNode(experimentsDetails, clients)
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node name")
}
}
experimentsDetails.TargetNode = targetNode.Spec.NodeName
experimentsDetails.TargetNodeIP = targetNode.Status.HostIP
// get the node ip
if experimentsDetails.TargetNodeIP == "" {
experimentsDetails.TargetNodeIP, err = getInternalIP(experimentsDetails.TargetNode, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get internal ip")
}
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
@ -54,15 +64,7 @@ func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, c
"Target Node IP": experimentsDetails.TargetNodeIP,
})
// Checking the status of target node
log.Info("[Status]: Getting the status of target node")
err = status.CheckNodeStatus(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("Target node is not in ready state, err: %v", err)
}
experimentsDetails.RunID = common.GetRunID()
appLabel := "name=" + experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID
experimentsDetails.RunID = stringutils.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -74,56 +76,26 @@ func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, c
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + experimentsDetails.TargetNode + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
}
}
if experimentsDetails.EngineName != "" {
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// Get Resource Requirements
experimentsDetails.Resources, err = common.GetChaosPodResourceRequirements(experimentsDetails.ChaosPodName, experimentsDetails.ExperimentName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("Unable to get resource requirements, err: %v", err)
}
}
// Creating the helper pod to perform node restart
err = CreateHelperPod(experimentsDetails, clients)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
if err = createHelperPod(ctx, experimentsDetails, chaosDetails, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
err = CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
if err := common.CheckHelperStatusAndRunProbes(ctx, appLabel, experimentsDetails.TargetNode, chaosDetails, clients, resultDetails, eventsDetails); err != nil {
return err
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", strconv.Itoa(experimentsDetails.ChaosDuration+30))
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+30, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
}
// Checking the status of application node
log.Info("[Status]: Getting the status of application node")
err = status.CheckNodeStatus(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
log.Warnf("Application node is not in the ready state, you may need to manually recover the node, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
if err := common.WaitForCompletionAndDeleteHelperPods(appLabel, chaosDetails, clients, false); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
@ -131,38 +103,40 @@ func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, c
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", strconv.Itoa(experimentsDetails.RampTime))
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, clients clients.ClientSets) error {
// This method is attaching emptyDir along with secret volume, and copy data from secret
// to the emptyDir, because secret is mounted as readonly and with 777 perms and it can't be changed
// because of: https://github.com/kubernetes/kubernetes/issues/57923
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeRestartFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-" + "helper",
"name": experimentsDetails.ExperimentName + "-" + experimentsDetails.RunID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
},
Annotations: experimentsDetails.Annotations,
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
Affinity: &k8stypes.Affinity{
NodeAffinity: &k8stypes.NodeAffinity{
RequiredDuringSchedulingIgnoredDuringExecution: &k8stypes.NodeSelector{
NodeSelectorTerms: []k8stypes.NodeSelectorTerm{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
Affinity: &apiv1.Affinity{
NodeAffinity: &apiv1.NodeAffinity{
RequiredDuringSchedulingIgnoredDuringExecution: &apiv1.NodeSelector{
NodeSelectorTerms: []apiv1.NodeSelectorTerm{
{
MatchFields: []k8stypes.NodeSelectorRequirement{
MatchFields: []apiv1.NodeSelectorRequirement{
{
Key: schedulerapi.NodeFieldSelectorKeyNodeName,
Operator: k8stypes.NodeSelectorOpNotIn,
Key: ObjectNameField,
Operator: apiv1.NodeSelectorOpNotIn,
Values: []string{experimentsDetails.TargetNode},
},
},
@ -180,7 +154,7 @@ func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"/bin/sh",
},
Args: []string{"-c", fmt.Sprintf("cp %[1]s %[2]s && chmod 400 %[2]s && ssh -o \"StrictHostKeyChecking=no\" -o \"UserKnownHostsFile=/dev/null\" -i %[2]s %[3]s@%[4]s %[5]s", privateKeyPath, emptyDirPath, experimentsDetails.SSHUser, experimentsDetails.TargetNodeIP, experimentsDetails.RebootCommand)},
Resources: experimentsDetails.Resources,
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: privateKeySecret + experimentsDetails.RunID,
@ -212,39 +186,28 @@ func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}
//GetNode will select a random replica of application pod and return the node spec of that application pod
func GetNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (*k8stypes.Pod, error) {
podList, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).List(v1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
if err != nil || len(podList.Items) == 0 {
return nil, errors.Wrapf(err, "Fail to get the application pod in %v namespace, err: %v", experimentsDetails.AppNS, err)
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
rand.Seed(time.Now().Unix())
randomIndex := rand.Intn(len(podList.Items))
podForNodeCandidate := podList.Items[randomIndex]
return &podForNodeCandidate, nil
}
// CheckApplicationStatus checks the status of the AUT
func CheckApplicationStatus(appNs, appLabel string, timeout, delay int, clients clients.ClientSets) error {
// Checking whether application containers are in ready state
log.Info("[Status]: Checking whether application containers are in ready state")
err := status.CheckContainerStatus(appNs, appLabel, timeout, delay, clients)
if err != nil {
return err
}
// Checking whether application pods are in running or completed state
log.Info("[Status]: Checking whether application pods are in running or completed state")
err = status.CheckPodStatusPhase(appNs, appLabel, timeout, delay, clients, "Running", "Completed")
if err != nil {
return err
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getInternalIP gets the internal ip of the given node
func getInternalIP(nodeName string, clients clients.ClientSets) (string, error) {
node, err := clients.GetNode(nodeName, 180, 2)
if err != nil {
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", nodeName), Reason: err.Error()}
}
for _, addr := range node.Status.Addresses {
if strings.ToLower(string(addr.Type)) == "internalip" {
return addr.Address, nil
}
}
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", nodeName), Reason: "failed to get the internal ip of the target node"}
}

View File

@ -1,6 +1,7 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
@ -8,23 +9,41 @@ import (
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-taint/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var err error
var (
err error
inject, abort chan os.Signal
)
//PrepareNodeTaint contains the prepration steps before chaos injection
func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareNodeTaint contains the preparation steps before chaos injection
func PrepareNodeTaint(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeTaintFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -34,9 +53,9 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
if experimentsDetails.TargetNode == "" {
//Select node for kubelet-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, clients)
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node name")
}
}
@ -46,81 +65,51 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, clients, resultDetails, chaosDetails, eventsDetails)
// taint the application node
err := TaintNode(experimentsDetails, clients)
if err != nil {
return err
if err := taintNode(ctx, experimentsDetails, clients, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not taint node")
}
// Verify the status of AUT after reschedule
log.Info("[Status]: Verify the status of AUT after reschedule")
err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("Application status check failed, err: %v", err)
}
// Verify the status of Auxiliary Applications after reschedule
if experimentsDetails.AuxiliaryAppInfo != "" {
log.Info("[Status]: Verify that the Auxiliary Applications are running")
err = status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("Auxiliary Applications status check failed, err: %v", err)
if err = status.AUTStatusCheck(clients, chaosDetails); err != nil {
log.Info("[Revert]: Reverting chaos because application status check failed")
if taintErr := removeTaintFromNode(experimentsDetails, clients, chaosDetails); taintErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(taintErr).Error())}
}
}
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
loop:
for {
endTime = time.After(timeDelay)
select {
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
// updating the chaosresult after stopped
failStep := "Node Taint injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
// generating summary event in chaosengine
msg := experimentsDetails.ExperimentName + " experiment has been aborted"
types.SetEngineEventAttributes(eventsDetails, types.Summary, msg, "Warning", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
// generating summary event in chaosresult
types.SetResultEventAttributes(eventsDetails, types.StoppedVerdict, msg, "Warning", resultDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosResult")
if err = RemoveTaintFromNode(experimentsDetails, clients); err != nil {
log.Errorf("unable to remove taint from the node, err :%v", err)
}
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
// remove taint from the application node
err = RemoveTaintFromNode(experimentsDetails, clients)
if err != nil {
return err
}
// Checking the status of target nodes
log.Info("[Status]: Getting the status of target nodes")
err = status.CheckNodeStatus(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
log.Warnf("Target nodes are not in the ready state, you may need to manually recover the node, err: %v", err)
if experimentsDetails.AuxiliaryAppInfo != "" {
log.Info("[Status]: Verify that the Auxiliary Applications are running")
if err = status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
log.Info("[Revert]: Reverting chaos because auxiliary application status check failed")
if taintErr := removeTaintFromNode(experimentsDetails, clients, chaosDetails); taintErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(taintErr).Error())}
}
return err
}
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Stopping the experiment")
// remove taint from the application node
if err := removeTaintFromNode(experimentsDetails, clients, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not remove taint from node")
}
//Waiting for the ramp time after chaos injection
@ -131,107 +120,136 @@ loop:
return nil
}
// TaintNode taint the application node
func TaintNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
// taintNode taint the application node
func taintNode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeTaintFault")
defer span.End()
// get the taint labels & effect
TaintKey, TaintValue, TaintEffect := GetTaintDetails(experimentsDetails)
taintKey, taintValue, taintEffect := getTaintDetails(experimentsDetails)
log.Infof("Add %v taints to the %v node", TaintKey+"="+TaintValue+":"+TaintEffect, experimentsDetails.TargetNode)
log.Infof("Add %v taints to the %v node", taintKey+"="+taintValue+":"+taintEffect, experimentsDetails.TargetNode)
// get the node details
node, err := clients.KubeClient.CoreV1().Nodes().Get(experimentsDetails.TargetNode, v1.GetOptions{})
if err != nil || node == nil {
return errors.Errorf("failed to get %v node, err: %v", experimentsDetails.TargetNode, err)
node, err := clients.GetNode(experimentsDetails.TargetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{nodeName: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
}
// check if the taint already exists
tainted := false
for _, taint := range node.Spec.Taints {
if taint.Key == TaintKey {
if taint.Key == taintKey {
tainted = true
break
}
}
if !tainted {
node.Spec.Taints = append(node.Spec.Taints, apiv1.Taint{
Key: TaintKey,
Value: TaintValue,
Effect: apiv1.TaintEffect(TaintEffect),
})
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
if !tainted {
node.Spec.Taints = append(node.Spec.Taints, apiv1.Taint{
Key: taintKey,
Value: taintValue,
Effect: apiv1.TaintEffect(taintEffect),
})
updatedNodeWithTaint, err := clients.KubeClient.CoreV1().Nodes().Update(node)
if err != nil || updatedNodeWithTaint == nil {
return fmt.Errorf("failed to update %v node after adding taints, err: %v", experimentsDetails.TargetNode, err)
if err := clients.UpdateNode(chaosDetails, node); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{nodeName: %s}", node.Name), Reason: fmt.Sprintf("failed to add taints: %s", err.Error())}
}
}
}
log.Infof("Successfully added taint in %v node", experimentsDetails.TargetNode)
common.SetTargets(node.Name, "injected", "node", chaosDetails)
log.Infof("Successfully added taint in %v node", experimentsDetails.TargetNode)
}
return nil
}
// RemoveTaintFromNode remove the taint from the application node
func RemoveTaintFromNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
// removeTaintFromNode remove the taint from the application node
func removeTaintFromNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// Get the taint key
TaintLabel := strings.Split(experimentsDetails.Taints, ":")
TaintKey := strings.Split(TaintLabel[0], "=")[0]
taintLabel := strings.Split(experimentsDetails.Taints, ":")
taintKey := strings.Split(taintLabel[0], "=")[0]
// get the node details
node, err := clients.KubeClient.CoreV1().Nodes().Get(experimentsDetails.TargetNode, v1.GetOptions{})
if err != nil || node == nil {
return errors.Errorf("failed to get %v node, err: %v", experimentsDetails.TargetNode, err)
node, err := clients.GetNode(experimentsDetails.TargetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{nodeName: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
}
// check if the taint already exists
tainted := false
for _, taint := range node.Spec.Taints {
if taint.Key == TaintKey {
if taint.Key == taintKey {
tainted = true
break
}
}
if tainted {
var Newtaints []apiv1.Taint
var newTaints []apiv1.Taint
// remove all the taints with matching key
for _, taint := range node.Spec.Taints {
if taint.Key != TaintKey {
Newtaints = append(Newtaints, taint)
if taint.Key != taintKey {
newTaints = append(newTaints, taint)
}
}
node.Spec.Taints = Newtaints
updatedNodeWithTaint, err := clients.KubeClient.CoreV1().Nodes().Update(node)
if err != nil || updatedNodeWithTaint == nil {
return fmt.Errorf("failed to update %v node after removing taints, err: %v", experimentsDetails.TargetNode, err)
node.Spec.Taints = newTaints
if err := clients.UpdateNode(chaosDetails, node); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{nodeName: %s}", node.Name), Reason: fmt.Sprintf("failed to remove taints: %s", err.Error())}
}
}
common.SetTargets(node.Name, "reverted", "node", chaosDetails)
log.Infof("Successfully removed taint from the %v node", node.Name)
return nil
}
// GetTaintDetails return the key, value and effect for the taint
func GetTaintDetails(experimentsDetails *experimentTypes.ExperimentDetails) (string, string, string) {
TaintValue := "node-taint"
TaintEffect := string(apiv1.TaintEffectNoExecute)
func getTaintDetails(experimentsDetails *experimentTypes.ExperimentDetails) (string, string, string) {
taintValue := "node-taint"
taintEffect := string(apiv1.TaintEffectNoExecute)
Taints := strings.Split(experimentsDetails.Taints, ":")
TaintLabel := strings.Split(Taints[0], "=")
TaintKey := TaintLabel[0]
taints := strings.Split(experimentsDetails.Taints, ":")
taintLabel := strings.Split(taints[0], "=")
taintKey := taintLabel[0]
// It will set the value for taint label from `TAINT` env, if provided
// otherwise it will use the `node-taint` value as default value.
if len(TaintLabel) >= 2 {
TaintValue = TaintLabel[1]
if len(taintLabel) >= 2 {
taintValue = taintLabel[1]
}
// It will set the value for taint effect from `TAINT` env, if provided
// otherwise it will use `NoExecute` value as default value.
if len(Taints) >= 2 {
TaintEffect = Taints[1]
if len(taints) >= 2 {
taintEffect = taints[1]
}
return TaintKey, TaintValue, TaintEffect
return taintKey, taintValue, taintEffect
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails) {
// waiting till the abort signal received
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
if err := removeTaintFromNode(experimentsDetails, clients, chaosDetails); err != nil {
log.Errorf("Unable to untaint node, err: %v", err)
}
retry--
time.Sleep(1 * time.Second)
}
log.Info("Chaos Revert Completed")
os.Exit(0)
}

View File

@ -1,19 +1,24 @@
package lib
import (
"math"
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-autoscaler/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/math"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
@ -21,8 +26,6 @@ import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
appsv1 "k8s.io/client-go/kubernetes/typed/apps/v1"
retries "k8s.io/client-go/util/retry"
"github.com/pkg/errors"
)
var (
@ -31,8 +34,10 @@ var (
appsv1StatefulsetClient appsv1.StatefulSetInterface
)
//PreparePodAutoscaler contains the prepration steps before chaos injection
func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PreparePodAutoscaler contains the preparation steps and chaos injection steps
func PreparePodAutoscaler(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodAutoscalerFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -47,9 +52,9 @@ func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails,
switch strings.ToLower(experimentsDetails.AppKind) {
case "deployment", "deployments":
appsUnderTest, err := GetDeploymentDetails(experimentsDetails, clients)
appsUnderTest, err := getDeploymentDetails(experimentsDetails)
if err != nil {
return errors.Errorf("Unable to get the name & replicaCount of the deployment, err: %v", err)
return stacktrace.Propagate(err, "could not get deployment details")
}
deploymentList := []string{}
@ -57,54 +62,50 @@ func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails,
deploymentList = append(deploymentList, deployment.AppName)
}
log.InfoWithValues("[Info]: Details of Deployments under chaos injection", logrus.Fields{
"No. Of Deployments": len(deploymentList),
"Target Deployments": deploymentList,
"Number Of Deployment": len(deploymentList),
"Target Deployments": deploymentList,
})
//calling go routine which will continuously watch for the abort signal
go AbortPodAutoScalerChaos(appsUnderTest, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails)
go abortPodAutoScalerChaos(appsUnderTest, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails)
err = PodAutoscalerChaosInDeployment(experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails)
if err != nil {
return errors.Errorf("Unable to perform autoscaling, err: %v", err)
if err = podAutoscalerChaosInDeployment(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not scale deployment")
}
err = AutoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest)
if err != nil {
return errors.Errorf("Unable to rollback the autoscaling, err: %v", err)
if err = autoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not revert scaling in deployment")
}
case "statefulset", "statefulsets":
appsUnderTest, err := GetStatefulsetDetails(experimentsDetails, clients)
appsUnderTest, err := getStatefulsetDetails(experimentsDetails)
if err != nil {
return errors.Errorf("Unable to get the name & replicaCount of the statefulset, err: %v", err)
return stacktrace.Propagate(err, "could not get statefulset details")
}
stsList := []string{}
var stsList []string
for _, sts := range appsUnderTest {
stsList = append(stsList, sts.AppName)
}
log.InfoWithValues("[Info]: Details of Statefulsets under chaos injection", logrus.Fields{
"No. Of Statefulsets": len(stsList),
"Target Statefulsets": stsList,
"Number Of Statefulsets": len(stsList),
"Target Statefulsets": stsList,
})
//calling go routine which will continuously watch for the abort signal
go AbortPodAutoScalerChaos(appsUnderTest, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails)
go abortPodAutoScalerChaos(appsUnderTest, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails)
err = PodAutoscalerChaosInStatefulset(experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails)
if err != nil {
return errors.Errorf("Unable to perform autoscaling, err: %v", err)
if err = podAutoscalerChaosInStatefulset(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not scale statefulset")
}
err = AutoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest)
if err != nil {
return errors.Errorf("Unable to rollback the autoscaling, err: %v", err)
if err = autoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not revert scaling in statefulset")
}
default:
return errors.Errorf("application type '%s' is not supported for the chaos", experimentsDetails.AppKind)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{kind: %s}", experimentsDetails.AppKind), Reason: "application type is not supported"}
}
//Waiting for the ramp time after chaos injection
@ -115,380 +116,329 @@ func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails,
return nil
}
func getSliceOfTotalApplicationsTargeted(appList []experimentTypes.ApplicationUnderTest, experimentsDetails *experimentTypes.ExperimentDetails) ([]experimentTypes.ApplicationUnderTest, error) {
func getSliceOfTotalApplicationsTargeted(appList []experimentTypes.ApplicationUnderTest, experimentsDetails *experimentTypes.ExperimentDetails) []experimentTypes.ApplicationUnderTest {
slice := int(math.Round(float64(len(appList)*experimentsDetails.AppAffectPercentage) / float64(100)))
if slice < 0 || slice > len(appList) {
return nil, errors.Errorf("slice of applications to target out of range %d/%d", slice, len(appList))
}
return appList[:slice], nil
newAppListLength := math.Maximum(1, math.Adjustment(math.Minimum(experimentsDetails.AppAffectPercentage, 100), len(appList)))
return appList[:newAppListLength]
}
//GetDeploymentDetails is used to get the name and total number of replicas of the deployment
func GetDeploymentDetails(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) ([]experimentTypes.ApplicationUnderTest, error) {
// getDeploymentDetails is used to get the name and total number of replicas of the deployment
func getDeploymentDetails(experimentsDetails *experimentTypes.ExperimentDetails) ([]experimentTypes.ApplicationUnderTest, error) {
deploymentList, err := appsv1DeploymentClient.List(metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
if err != nil || len(deploymentList.Items) == 0 {
return nil, errors.Errorf("Unable to find the deployments with matching labels, err: %v", err)
deploymentList, err := appsv1DeploymentClient.List(context.Background(), metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: deployment, labels: %s}", experimentsDetails.AppLabel), Reason: err.Error()}
} else if len(deploymentList.Items) == 0 {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: deployment, labels: %s}", experimentsDetails.AppLabel), Reason: "no deployment found with matching labels"}
}
appsUnderTest := []experimentTypes.ApplicationUnderTest{}
var appsUnderTest []experimentTypes.ApplicationUnderTest
for _, app := range deploymentList.Items {
log.Infof("[DeploymentDetails]: Found deployment name %s with replica count %d", app.Name, int(*app.Spec.Replicas))
log.Infof("[Info]: Found deployment name '%s' with replica count '%d'", app.Name, int(*app.Spec.Replicas))
appsUnderTest = append(appsUnderTest, experimentTypes.ApplicationUnderTest{AppName: app.Name, ReplicaCount: int(*app.Spec.Replicas)})
}
// Applying the APP_AFFECT_PERC variable to determine the total target deployments to scale
return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails)
// Applying the APP_AFFECTED_PERC variable to determine the total target deployments to scale
return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails), nil
}
//GetStatefulsetDetails is used to get the name and total number of replicas of the statefulsets
func GetStatefulsetDetails(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) ([]experimentTypes.ApplicationUnderTest, error) {
// getStatefulsetDetails is used to get the name and total number of replicas of the statefulsets
func getStatefulsetDetails(experimentsDetails *experimentTypes.ExperimentDetails) ([]experimentTypes.ApplicationUnderTest, error) {
statefulsetList, err := appsv1StatefulsetClient.List(metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
if err != nil || len(statefulsetList.Items) == 0 {
return nil, errors.Errorf("Unable to find the statefulsets with matching labels, err: %v", err)
statefulsetList, err := appsv1StatefulsetClient.List(context.Background(), metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: statefulset, labels: %s}", experimentsDetails.AppLabel), Reason: err.Error()}
} else if len(statefulsetList.Items) == 0 {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: statefulset, labels: %s}", experimentsDetails.AppLabel), Reason: "no statefulset found with matching labels"}
}
appsUnderTest := []experimentTypes.ApplicationUnderTest{}
for _, app := range statefulsetList.Items {
log.Infof("[DeploymentDetails]: Found statefulset name %s with replica count %d", app.Name, int(*app.Spec.Replicas))
log.Infof("[Info]: Found statefulset name '%s' with replica count '%d'", app.Name, int(*app.Spec.Replicas))
appsUnderTest = append(appsUnderTest, experimentTypes.ApplicationUnderTest{AppName: app.Name, ReplicaCount: int(*app.Spec.Replicas)})
}
// Applying the APP_AFFECT_PERC variable to determine the total target deployments to scale
return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails)
return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails), nil
}
//PodAutoscalerChaosInDeployment scales up the replicas of deployment and verify the status
func PodAutoscalerChaosInDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// podAutoscalerChaosInDeployment scales up the replicas of deployment and verify the status
func podAutoscalerChaosInDeployment(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Scale Application
retryErr := retries.RetryOnConflict(retries.DefaultRetry, func() error {
for _, app := range appsUnderTest {
// Retrieve the latest version of Deployment before attempting update
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
appUnderTest, err := appsv1DeploymentClient.Get(app.AppName, metav1.GetOptions{})
appUnderTest, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("Failed to get latest version of Application Deployment, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: err.Error()}
}
// modifying the replica count
appUnderTest.Spec.Replicas = int32Ptr(int32(experimentsDetails.Replicas))
log.Infof("Updating deployment %s to number of replicas %d", appUnderTest.ObjectMeta.Name, experimentsDetails.Replicas)
_, err = appsv1DeploymentClient.Update(appUnderTest)
log.Infof("Updating deployment '%s' to number of replicas '%d'", appUnderTest.ObjectMeta.Name, experimentsDetails.Replicas)
_, err = appsv1DeploymentClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to scale deployment :%s", err.Error())}
}
common.SetTargets(app.AppName, "injected", "deployment", chaosDetails)
}
return nil
})
if retryErr != nil {
return errors.Errorf("Unable to scale the deployment, err: %v", retryErr)
return retryErr
}
log.Info("Application Started Scaling")
log.Info("[Info]: The application started scaling")
err = DeploymentStatusCheck(experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails)
if err != nil {
return errors.Errorf("Status Check failed, err: %v", err)
}
return nil
return deploymentStatusCheck(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails)
}
//PodAutoscalerChaosInStatefulset scales up the replicas of statefulset and verify the status
func PodAutoscalerChaosInStatefulset(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// podAutoscalerChaosInStatefulset scales up the replicas of statefulset and verify the status
func podAutoscalerChaosInStatefulset(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Scale Application
retryErr := retries.RetryOnConflict(retries.DefaultRetry, func() error {
for _, app := range appsUnderTest {
// Retrieve the latest version of Statefulset before attempting update
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
appUnderTest, err := appsv1StatefulsetClient.Get(app.AppName, metav1.GetOptions{})
appUnderTest, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("Failed to get latest version of Application Statefulset, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: err.Error()}
}
// modifying the replica count
appUnderTest.Spec.Replicas = int32Ptr(int32(experimentsDetails.Replicas))
_, err = appsv1StatefulsetClient.Update(appUnderTest)
_, err = appsv1StatefulsetClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to scale statefulset :%s", err.Error())}
}
common.SetTargets(app.AppName, "injected", "statefulset", chaosDetails)
}
return nil
})
if retryErr != nil {
return errors.Errorf("Unable to scale the statefulset, err: %v", retryErr)
return retryErr
}
log.Info("Application Started Scaling")
log.Info("[Info]: The application started scaling")
err = StatefulsetStatusCheck(experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails)
if err != nil {
return errors.Errorf("Status Check failed, err: %v", err)
}
return nil
return statefulsetStatusCheck(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails)
}
// DeploymentStatusCheck check the status of deployment and verify the available replicas
func DeploymentStatusCheck(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// deploymentStatusCheck check the status of deployment and verify the available replicas
func deploymentStatusCheck(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Record start timestamp
ChaosStartTimeStamp := time.Now().Unix()
isFailed := false
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
err = retry.
Times(uint(experimentsDetails.ChaosDuration / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
for _, app := range appsUnderTest {
deployment, err := appsv1DeploymentClient.Get(app.AppName, metav1.GetOptions{})
deployment, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("Unable to find the deployment with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
log.Infof("Deployment's Available Replica Count is %v", deployment.Status.AvailableReplicas)
if int(deployment.Status.AvailableReplicas) != app.ReplicaCount {
isFailed = true
return errors.Errorf("Application %s is not scaled yet, err: %v", app.AppName, err)
if int(deployment.Status.ReadyReplicas) != experimentsDetails.Replicas {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to scale deployment, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, deployment.Status.ReadyReplicas)}
}
}
isFailed = false
return nil
})
if isFailed {
err = AutoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest)
if err != nil {
return errors.Errorf("Unable to perform autoscaling, err: %v", err)
}
return errors.Errorf("Failed to scale the application")
}
if err != nil {
return err
if scaleErr := autoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest, chaosDetails); scaleErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(scaleErr).Error())}
}
return stacktrace.Propagate(err, "failed to scale replicas")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
//ChaosCurrentTimeStamp contains the current timestamp
ChaosCurrentTimeStamp := time.Now().Unix()
if int(ChaosCurrentTimeStamp-ChaosStartTimeStamp) <= experimentsDetails.ChaosDuration {
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
if duration < experimentsDetails.ChaosDuration {
log.Info("[Wait]: Waiting for completion of chaos duration")
time.Sleep(time.Duration(experimentsDetails.ChaosDuration-int(ChaosCurrentTimeStamp-ChaosStartTimeStamp)) * time.Second)
time.Sleep(time.Duration(experimentsDetails.ChaosDuration-duration) * time.Second)
}
return nil
}
// StatefulsetStatusCheck check the status of statefulset and verify the available replicas
func StatefulsetStatusCheck(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// statefulsetStatusCheck check the status of statefulset and verify the available replicas
func statefulsetStatusCheck(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Record start timestamp
ChaosStartTimeStamp := time.Now().Unix()
isFailed := false
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
err = retry.
Times(uint(experimentsDetails.ChaosDuration / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
for _, app := range appsUnderTest {
statefulset, err := appsv1StatefulsetClient.Get(app.AppName, metav1.GetOptions{})
statefulset, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("Unable to find the statefulset with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
log.Infof("Statefulset's Ready Replica Count is: %v", statefulset.Status.ReadyReplicas)
if int(statefulset.Status.ReadyReplicas) != experimentsDetails.Replicas {
isFailed = true
return errors.Errorf("Application is not scaled yet, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to scale statefulset, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, statefulset.Status.ReadyReplicas)}
}
}
isFailed = false
return nil
})
if isFailed {
err = AutoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest)
if err != nil {
return errors.Errorf("Unable to perform autoscaling, err: %v", err)
}
return errors.Errorf("Failed to scale the application")
}
if err != nil {
return err
if scaleErr := autoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest, chaosDetails); scaleErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(scaleErr).Error())}
}
return stacktrace.Propagate(err, "failed to scale replicas")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
//ChaosCurrentTimeStamp contains the current timestamp
ChaosCurrentTimeStamp := time.Now().Unix()
if int(ChaosCurrentTimeStamp-ChaosStartTimeStamp) <= experimentsDetails.ChaosDuration {
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
if duration < experimentsDetails.ChaosDuration {
log.Info("[Wait]: Waiting for completion of chaos duration")
time.Sleep(time.Duration(experimentsDetails.ChaosDuration-int(ChaosCurrentTimeStamp-ChaosStartTimeStamp)) * time.Second)
time.Sleep(time.Duration(experimentsDetails.ChaosDuration-duration) * time.Second)
}
return nil
}
//AutoscalerRecoveryInDeployment rollback the replicas to initial values in deployment
func AutoscalerRecoveryInDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest) error {
// autoscalerRecoveryInDeployment rollback the replicas to initial values in deployment
func autoscalerRecoveryInDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, chaosDetails *types.ChaosDetails) error {
// Scale back to initial number of replicas
retryErr := retries.RetryOnConflict(retries.DefaultRetry, func() error {
// Retrieve the latest version of Deployment before attempting update
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
for _, app := range appsUnderTest {
appUnderTest, err := appsv1DeploymentClient.Get(app.AppName, metav1.GetOptions{})
appUnderTest, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("Failed to find the latest version of Application Deployment with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
appUnderTest.Spec.Replicas = int32Ptr(int32(app.ReplicaCount)) // modify replica count
_, err = appsv1DeploymentClient.Update(appUnderTest)
_, err = appsv1DeploymentClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to revert scaling in deployment :%s", err.Error())}
}
common.SetTargets(app.AppName, "reverted", "deployment", chaosDetails)
}
return nil
})
if retryErr != nil {
return errors.Errorf("Unable to rollback the deployment, err: %v", retryErr)
}
log.Info("[Info]: Application pod started rolling back")
err = retry.
if retryErr != nil {
return retryErr
}
log.Info("[Info]: Application started rolling back to original replica count")
return retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
for _, app := range appsUnderTest {
applicationDeploy, err := appsv1DeploymentClient.Get(app.AppName, metav1.GetOptions{})
applicationDeploy, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("Unable to find the deployment with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
if int(applicationDeploy.Status.AvailableReplicas) != app.ReplicaCount {
log.Infof("Application Available Replica Count is: %v", applicationDeploy.Status.AvailableReplicas)
return errors.Errorf("Unable to rollback to older replica count, err: %v", err)
if int(applicationDeploy.Status.ReadyReplicas) != app.ReplicaCount {
log.Infof("[Info]: Application ready replica count is: %v", applicationDeploy.Status.ReadyReplicas)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to rollback deployment scaling, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, applicationDeploy.Status.ReadyReplicas)}
}
}
log.Info("[RollBack]: Application rollback to the initial number of replicas")
return nil
})
if err != nil {
return err
}
log.Info("[RollBack]: Application Pod roll back to initial number of replicas")
return nil
}
//AutoscalerRecoveryInStatefulset rollback the replicas to initial values in deployment
func AutoscalerRecoveryInStatefulset(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest) error {
// autoscalerRecoveryInStatefulset rollback the replicas to initial values in deployment
func autoscalerRecoveryInStatefulset(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, chaosDetails *types.ChaosDetails) error {
// Scale back to initial number of replicas
retryErr := retries.RetryOnConflict(retries.DefaultRetry, func() error {
for _, app := range appsUnderTest {
// Retrieve the latest version of Statefulset before attempting update
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
appUnderTest, err := appsv1StatefulsetClient.Get(app.AppName, metav1.GetOptions{})
appUnderTest, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("Failed to find the latest version of Statefulset with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
appUnderTest.Spec.Replicas = int32Ptr(int32(app.ReplicaCount)) // modify replica count
_, err = appsv1StatefulsetClient.Update(appUnderTest)
_, err = appsv1StatefulsetClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to revert scaling in statefulset :%s", err.Error())}
}
common.SetTargets(app.AppName, "reverted", "statefulset", chaosDetails)
}
return nil
})
if retryErr != nil {
return errors.Errorf("Unable to rollback the statefulset, err: %v", retryErr)
return retryErr
}
log.Info("[Info]: Application pod started rolling back")
err = retry.
return retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Try(func(attempt uint) error {
for _, app := range appsUnderTest {
applicationDeploy, err := appsv1StatefulsetClient.Get(app.AppName, metav1.GetOptions{})
applicationDeploy, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("Unable to find the statefulset with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
if int(applicationDeploy.Status.ReadyReplicas) != app.ReplicaCount {
log.Infof("Application Ready Replica Count is: %v", applicationDeploy.Status.ReadyReplicas)
return errors.Errorf("Unable to roll back to older replica count, err: %v", err)
log.Infof("Application ready replica count is: %v", applicationDeploy.Status.ReadyReplicas)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to rollback statefulset scaling, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, applicationDeploy.Status.ReadyReplicas)}
}
}
log.Info("[RollBack]: Application roll back to initial number of replicas")
return nil
})
if err != nil {
return err
}
log.Info("[RollBack]: Application Pod roll back to initial number of replicas")
return nil
}
func int32Ptr(i int32) *int32 { return &i }
//AbortPodAutoScalerChaos go routine will continuously watch for the abort signal for the entire chaos duration and generate the required events and result
func AbortPodAutoScalerChaos(appsUnderTest []experimentTypes.ApplicationUnderTest, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// abortPodAutoScalerChaos go routine will continuously watch for the abort signal for the entire chaos duration and generate the required events and result
func abortPodAutoScalerChaos(appsUnderTest []experimentTypes.ApplicationUnderTest, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) {
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
for {
select {
case <-signChan:
log.Info("[Chaos]: Chaos Experiment Abortion started because of terminated signal received")
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
// waiting till the abort signal received
<-signChan
// generating summary event in chaosengine
msg := experimentsDetails.ExperimentName + " experiment has been aborted"
types.SetEngineEventAttributes(eventsDetails, types.Summary, msg, "Warning", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
// generating summary event in chaosresult
types.SetResultEventAttributes(eventsDetails, types.StoppedVerdict, msg, "Warning", resultDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosResult")
// Note that we are attempting recovery (in this case scaling down to original replica count) after ..
// .. the tasks to patch results & generate events. This is so because the func AutoscalerRecovery..
// ..takes more time to complete - it involves a status check post the downscale. We have a period of ..
// .. few seconds before the pod deletion/removal occurs from the time the TERM is caught and thereby..
// ..run the risk of not updating the status of the objects/create events. With the current approach..
// ..tests indicate we succeed with the downscale/patch call, even if the status checks take longer
// As such, this is a workaround, and other solutions such as usage of pre-stop hooks etc., need to be explored
// Other experiments have simpler "recoveries" that are more or less guaranteed to work.
switch strings.ToLower(experimentsDetails.AppKind) {
case "deployment", "deployments":
if err := AutoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest); err != nil {
log.Errorf("the recovery after abortion failed err: %v", err)
}
case "statefulset", "statefulsets":
if err := AutoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest); err != nil {
log.Errorf("the recovery after abortion failed err: %v", err)
}
default:
return errors.Errorf("application type '%s' is not supported for the chaos", experimentsDetails.AppKind)
}
os.Exit(1)
log.Info("[Chaos]: Revert Started")
// Note that we are attempting recovery (in this case scaling down to original replica count) after ..
// .. the tasks to patch results & generate events. This is so because the func AutoscalerRecovery..
// ..takes more time to complete - it involves a status check post the downscale. We have a period of ..
// .. few seconds before the pod deletion/removal occurs from the time the TERM is caught and thereby..
// ..run the risk of not updating the status of the objects/create events. With the current approach..
// ..tests indicate we succeed with the downscale/patch call, even if the status checks take longer
// As such, this is a workaround, and other solutions such as usage of pre-stop hooks etc., need to be explored
// Other experiments have simpler "recoveries" that are more or less guaranteed to work.
switch strings.ToLower(experimentsDetails.AppKind) {
case "deployment", "deployments":
if err := autoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
log.Errorf("the recovery after abortion failed err: %v", err)
}
case "statefulset", "statefulsets":
if err := autoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
log.Errorf("the recovery after abortion failed err: %v", err)
}
default:
log.Errorf("application type '%s' is not supported for the chaos", experimentsDetails.AppKind)
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
}

View File

@ -0,0 +1,329 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-cpu-hog-exec/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
)
var inject chan os.Signal
// PrepareCPUExecStress contains the chaos preparation and injection steps
func PrepareCPUExecStress(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodCPUHogExecFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the CPU stress experiment
if err := experimentCPU(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not stress cpu")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// stressCPU Uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the CPU utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
func stressCPU(experimentsDetails *experimentTypes.ExperimentDetails, podName, ns string, clients clients.ClientSets, stressErr chan error) {
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", experimentsDetails.ChaosInjectCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, ns)
_, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err
}
// experimentCPU function orchestrates the experiment by calling the StressCPU function for every core, of every container, of every pod that is targeted
func experimentCPU(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode stressed the cpu of all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodCPUHogExecFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
select {
case <-inject:
// stopping the chaos execution, if abort signal received
time.Sleep(10 * time.Second)
os.Exit(0)
default:
for _, pod := range targetPodList.Items {
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"CPU CORE": experimentsDetails.CPUcores,
})
for i := 0; i < experimentsDetails.CPUcores; i++ {
go stressCPU(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
}
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
loop:
for {
endTime = time.After(timeDelay)
select {
case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if memory to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed")
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("podName: %s, namespace: %s, container: %s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: fmt.Sprintf("failed to stress cpu of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressCPUSerial(experimentsDetails, pod.Name, pod.Namespace, clients, chaosDetails); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
// updating the chaosresult after stopped
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
if err := killStressCPUSerial(experimentsDetails, pod.Name, pod.Namespace, clients, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not revert cpu stress")
}
}
}
return nil
}
// injectChaosInParallelMode stressed the cpu of all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodCPUHogExecFaultInParallelMode")
defer span.End()
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
select {
case <-inject:
// stopping the chaos execution, if abort signal received
time.Sleep(10 * time.Second)
os.Exit(0)
default:
for _, pod := range targetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"CPU CORE": experimentsDetails.CPUcores,
})
for i := 0; i < experimentsDetails.CPUcores; i++ {
go stressCPU(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
}
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
}
}
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
loop:
for {
endTime = time.After(timeDelay)
select {
case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if memory to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed")
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to stress cpu of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressCPUParallel(experimentsDetails, targetPodList, clients, chaosDetails); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
// updating the chaosresult after stopped
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
return killStressCPUParallel(experimentsDetails, targetPodList, clients, chaosDetails)
}
// killStressCPUSerial function to kill a stress process running inside target container
//
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressCPUSerial(experimentsDetails *experimentTypes.ExperimentDetails, podName, ns string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", experimentsDetails.ChaosKillCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, ns)
out, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, ns), Reason: fmt.Sprintf("failed to revert chaos: %s", out)}
}
common.SetTargets(podName, "reverted", "pod", chaosDetails)
return nil
}
// killStressCPUParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressCPUParallel(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
var errList []string
for _, pod := range targetPodList.Items {
if err := killStressCPUSerial(experimentsDetails, pod.Name, pod.Namespace, clients, chaosDetails); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}

View File

@ -1,270 +0,0 @@
package lib
import (
"os"
"os/signal"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-cpu-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/klog"
)
// StressCPU Uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the CPU utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
func StressCPU(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets) error {
// It will contains all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", experimentsDetails.ChaosInjectCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, experimentsDetails.AppNS)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return errors.Errorf("Unable to run stress command inside target container, err: %v", err)
}
return nil
}
//ExperimentCPU function orchestrates the experiment by calling the StressCPU function for every core, of every container, of every pod that is targeted
func ExperimentCPU(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = GetTargetContainer(experimentsDetails, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("Unable to get the target container name, err: %v", err)
}
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
} else {
if err = InjectChaosInParallelMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
}
return nil
}
// InjectChaosInSerialMode stressed the cpu of all target application serially (one by one)
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
for _, pod := range targetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"CPU CORE": experimentsDetails.CPUcores,
})
for i := 0; i < experimentsDetails.CPUcores; i++ {
go StressCPU(experimentsDetails, pod.Name, clients)
}
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
loop:
for {
endTime = time.After(timeDelay)
select {
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
err := KillStressCPUSerial(experimentsDetails, pod.Name, clients)
if err != nil {
klog.V(0).Infof("Error in Kill stress after abortion")
return err
}
// updating the chaosresult after stopped
failStep := "CPU hog Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
// generating summary event in chaosengine
msg := experimentsDetails.ExperimentName + " experiment has been aborted"
types.SetEngineEventAttributes(eventsDetails, types.Summary, msg, "Warning", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
// generating summary event in chaosresult
types.SetResultEventAttributes(eventsDetails, types.StoppedVerdict, msg, "Warning", resultDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosResult")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
if err := KillStressCPUSerial(experimentsDetails, pod.Name, clients); err != nil {
return err
}
}
return nil
}
// InjectChaosInParallelMode stressed the cpu of all target application in parallel mode (all at once)
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
for _, pod := range targetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"CPU CORE": experimentsDetails.CPUcores,
})
for i := 0; i < experimentsDetails.CPUcores; i++ {
go StressCPU(experimentsDetails, pod.Name, clients)
}
}
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
loop:
for {
endTime = time.After(timeDelay)
select {
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
err := KillStressCPUParallel(experimentsDetails, targetPodList, clients)
if err != nil {
klog.V(0).Infof("Error in Kill stress after abortion")
return err
}
// updating the chaosresult after stopped
failStep := "CPU hog Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
// generating summary event in chaosengine
msg := experimentsDetails.ExperimentName + " experiment has been aborted"
types.SetEngineEventAttributes(eventsDetails, types.Summary, msg, "Warning", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
// generating summary event in chaosresult
types.SetResultEventAttributes(eventsDetails, types.StoppedVerdict, msg, "Warning", resultDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosResult")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
if err := KillStressCPUParallel(experimentsDetails, targetPodList, clients); err != nil {
return err
}
return nil
}
//PrepareCPUstress contains the steps for prepration before chaos
func PrepareCPUstress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the CPU stress experiment
err := ExperimentCPU(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails)
if err != nil {
return err
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
//GetTargetContainer will fetch the container name from application pod
// It will return the first container name from the application pod
func GetTargetContainer(experimentsDetails *experimentTypes.ExperimentDetails, appName string, clients clients.ClientSets) (string, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(appName, v1.GetOptions{})
if err != nil {
return "", err
}
return pod.Spec.Containers[0].Name, nil
}
// KillStressCPUSerial function to kill a stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func KillStressCPUSerial(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets) error {
// It will contains all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", experimentsDetails.ChaosKillCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, experimentsDetails.AppNS)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return errors.Errorf("Unable to kill the stress process in %v pod, err: %v", podName, err)
}
return nil
}
// KillStressCPUParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func KillStressCPUParallel(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets) error {
for _, pod := range targetPodList.Items {
if err := KillStressCPUSerial(experimentsDetails, pod.Name, clients); err != nil {
return err
}
}
return nil
}

View File

@ -1,27 +1,33 @@
package lib
import (
"context"
"fmt"
"strconv"
"strings"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-delete/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/math"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/workloads"
"github.com/palantir/stacktrace"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var err error
//PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Getting the iteration count for the pod deletion
GetIterations(experimentsDetails)
// PreparePodDelete contains the preparation steps before chaos injection
func PreparePodDelete(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodDeleteFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -29,14 +35,25 @@ func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, cli
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(experimentsDetails, clients, chaosDetails, eventsDetails); err != nil {
return err
//set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
} else {
if err = InjectChaosInParallelMode(experimentsDetails, clients, chaosDetails, eventsDetails); err != nil {
return err
case "parallel":
if err := injectChaosInParallelMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -47,27 +64,46 @@ func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, cli
return nil
}
// InjectChaosInSerialMode delete the target application pods serial mode(one by one)
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails) error {
// injectChaosInSerialMode delete the target application pods serial mode(one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDeleteFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
GracePeriod := int64(0)
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now().Unix()
for count := 0; count < experimentsDetails.Iterations; count++ {
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
// deriving the parent name of the target resources
for _, pod := range targetPodList.Items {
kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return stacktrace.Propagate(err, "could not get pod owner name and kind")
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
}
log.Infof("Target pods list, %v", podNames)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
@ -81,38 +117,43 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.InfoWithValues("[Info]: Killing the following pods", logrus.Fields{
"PodName": pod.Name})
if experimentsDetails.Force == true {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(pod.Name, &v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
if experimentsDetails.Force {
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(pod.Name, &v1.DeleteOptions{})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
}
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
}
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaosInterval != 0 {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
switch chaosDetails.Randomness {
case true:
if err := common.RandomInterval(experimentsDetails.ChaosInterval); err != nil {
return stacktrace.Propagate(err, "could not get random chaos interval")
}
default:
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaosInterval != "" {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaosInterval)
waitTime, _ := strconv.Atoi(experimentsDetails.ChaosInterval)
common.WaitForDuration(waitTime)
}
}
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return err
for _, parent := range chaosDetails.ParentsResources {
target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
}
//ChaosCurrentTimeStamp contains the current timestamp
ChaosCurrentTimeStamp := time.Now().Unix()
//ChaosDiffTimeStamp contains the difference of current timestamp and start timestamp
//It will helpful to track the total chaos duration
chaosDiffTimeStamp := ChaosCurrentTimeStamp - ChaosStartTimeStamp
if int(chaosDiffTimeStamp) >= experimentsDetails.ChaosDuration {
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
break
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
@ -122,106 +163,44 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// InjectChaosInParallelMode delete the target application pods in parallel mode (all at once)
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails) error {
// injectChaosInParallelMode delete the target application pods in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDeleteFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
GracePeriod := int64(0)
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now().Unix()
for count := 0; count < experimentsDetails.Iterations; count++ {
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
// deriving the parent name of the target resources
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Deleting the application pod
for _, pod := range targetPodList.Items {
log.InfoWithValues("[Info]: Killing the following pods", logrus.Fields{
"PodName": pod.Name})
if experimentsDetails.Force == true {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(pod.Name, &v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(pod.Name, &v1.DeleteOptions{})
}
kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get pod owner name and kind")
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaosInterval != 0 {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
}
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return err
}
//ChaosCurrentTimeStamp contains the current timestamp
ChaosCurrentTimeStamp := time.Now().Unix()
//ChaosDiffTimeStamp contains the difference of current timestamp and start timestamp
//It will helpful to track the total chaos duration
chaosDiffTimeStamp := ChaosCurrentTimeStamp - ChaosStartTimeStamp
if int(chaosDiffTimeStamp) >= experimentsDetails.ChaosDuration {
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
break
}
}
log.Infof("[Completion]: %v chaos is done", experimentsDetails.ExperimentName)
return nil
}
//GetIterations derive the iterations value from given parameters
func GetIterations(experimentsDetails *experimentTypes.ExperimentDetails) {
var Iterations int
if experimentsDetails.ChaosInterval != 0 {
Iterations = experimentsDetails.ChaosDuration / experimentsDetails.ChaosInterval
} else {
Iterations = 0
}
experimentsDetails.Iterations = math.Maximum(Iterations, 1)
}
//PodDeleteChaos deletes the random single/multiple pods
func PodDeleteChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
GracePeriod := int64(0)
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now().Unix()
for count := 0; count < experimentsDetails.Iterations; count++ {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
}
if experimentsDetails.EngineName != "" {
@ -236,42 +215,53 @@ func PodDeleteChaos(experimentsDetails *experimentTypes.ExperimentDetails, clien
log.InfoWithValues("[Info]: Killing the following pods", logrus.Fields{
"PodName": pod.Name})
if experimentsDetails.Force == true {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(pod.Name, &v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
if experimentsDetails.Force {
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(pod.Name, &v1.DeleteOptions{})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
}
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
}
}
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaosInterval != 0 {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
switch chaosDetails.Randomness {
case true:
if err := common.RandomInterval(experimentsDetails.ChaosInterval); err != nil {
return stacktrace.Propagate(err, "could not get random chaos interval")
}
default:
//Waiting for the chaos interval after chaos injection
if experimentsDetails.ChaosInterval != "" {
log.Infof("[Wait]: Wait for the chaos interval %vs", experimentsDetails.ChaosInterval)
waitTime, _ := strconv.Atoi(experimentsDetails.ChaosInterval)
common.WaitForDuration(waitTime)
}
}
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return err
for _, parent := range chaosDetails.ParentsResources {
target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
}
//ChaosCurrentTimeStamp contains the current timestamp
ChaosCurrentTimeStamp := time.Now().Unix()
//ChaosDiffTimeStamp contains the difference of current timestamp and start timestamp
//It will helpful to track the total chaos duration
chaosDiffTimeStamp := ChaosCurrentTimeStamp - ChaosStartTimeStamp
if int(chaosDiffTimeStamp) >= experimentsDetails.ChaosDuration {
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
break
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
log.Infof("[Completion]: %v chaos is done", experimentsDetails.ExperimentName)
return nil
}
// SetChaosTunables will setup a random value within a given range of values
// If the value is not provided in range it'll setup the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -0,0 +1,298 @@
package helper
import (
"bytes"
"context"
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"os"
"os/exec"
"os/signal"
"strconv"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-dns-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
clientTypes "k8s.io/apimachinery/pkg/types"
)
var (
abort, injectAbort chan os.Signal
err error
)
const (
// ProcessAlreadyKilled contains error code when process is already killed
ProcessAlreadyKilled = "no such process"
)
// Helper injects the dns chaos
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodDNSFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
resultDetails := types.ResultDetails{}
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// injectAbort channel is used to transmit signal notifications.
injectAbort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(injectAbort, os.Interrupt, syscall.SIGTERM)
//Fetching all the ENV passed for the helper pod
log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails)
// Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
// Initialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
if err := preparePodDNSChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
// preparePodDNSChaos contains the preparation steps before chaos injection
func preparePodDNSChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not parse targets")
}
var targets []targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
td.ContainerId, err = common.GetContainerID(td.Namespace, td.Name, td.TargetContainer, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the pid of the target container
td.Pid, err = common.GetPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
}
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos if an abort signal is received
go abortWatcher(targets, resultDetails.Name, chaosDetails.ChaosNamespace)
select {
case <-injectAbort:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
done := make(chan error, 1)
for index, t := range targets {
targets[index].Cmd, err = injectChaos(experimentsDetails, t)
if err != nil {
return stacktrace.Propagate(err, "could not inject chaos")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := terminateProcess(t); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.Info("[Wait]: Waiting for chaos completion")
// channel to check the completion of the stress process
go func() {
var errList []string
for _, t := range targets {
if err := t.Cmd.Wait(); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
log.Errorf("err: %v", strings.Join(errList, ", "))
done <- fmt.Errorf("err: %v", strings.Join(errList, ", "))
}
done <- nil
}()
// check the timeout for the command
// Note: timeout will occur when process didn't complete even after 10s of chaos duration
timeout := time.After((time.Duration(experimentsDetails.ChaosDuration) + 30) * time.Second)
select {
case <-timeout:
// the stress process gets timeout before completion
log.Infof("[Chaos] The stress process is not yet completed after the chaos duration of %vs", experimentsDetails.ChaosDuration+30)
log.Info("[Timeout]: Killing the stress process")
var errList []string
for _, t := range targets {
if err = terminateProcess(t); err != nil {
errList = append(errList, err.Error())
continue
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
case doneErr := <-done:
select {
case <-injectAbort:
// wait for the completion of abort handler
time.Sleep(10 * time.Second)
default:
log.Info("[Info]: Reverting Chaos")
var errList []string
for _, t := range targets {
if err := terminateProcess(t); err != nil {
errList = append(errList, err.Error())
continue
}
if err := result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return doneErr
}
}
return nil
}
func injectChaos(experimentsDetails *experimentTypes.ExperimentDetails, t targetDetails) (*exec.Cmd, error) {
// prepare dns interceptor
var out bytes.Buffer
commandTemplate := fmt.Sprintf("sudo TARGET_PID=%d CHAOS_TYPE=%s SPOOF_MAP='%s' TARGET_HOSTNAMES='%s' CHAOS_DURATION=%d MATCH_SCHEME=%s nsutil -p -n -t %d -- dns_interceptor", t.Pid, experimentsDetails.ChaosType, experimentsDetails.SpoofMap, experimentsDetails.TargetHostNames, experimentsDetails.ChaosDuration, experimentsDetails.MatchScheme, t.Pid)
cmd := exec.Command("/bin/bash", "-c", commandTemplate)
log.Info(cmd.String())
cmd.Stdout = &out
cmd.Stderr = &out
if err = cmd.Start(); err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: experimentsDetails.ChaosPodName, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("faild to inject chaos: %s", out.String())}
}
return cmd, nil
}
func terminateProcess(t targetDetails) error {
// kill command
killTemplate := fmt.Sprintf("sudo kill %d", t.Cmd.Process.Pid)
kill := exec.Command("/bin/bash", "-c", killTemplate)
var out bytes.Buffer
kill.Stderr = &out
kill.Stdout = &out
if err = kill.Run(); err != nil {
if strings.Contains(strings.ToLower(out.String()), ProcessAlreadyKilled) {
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("failed to revert chaos %s", out.String())}
} else {
log.Errorf("dns interceptor process stopped")
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
}
return nil
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targets []targetDetails, resultName, chaosNS string) {
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("[Abort]: Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
for _, t := range targets {
if err = terminateProcess(t); err != nil {
log.Errorf("unable to revert for %v pod, err :%v", t.Name, err)
continue
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult for %v pod, err :%v", t.Name, err)
}
}
retry--
time.Sleep(1 * time.Second)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "60"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.TargetHostNames = types.Getenv("TARGET_HOSTNAMES", "")
experimentDetails.SpoofMap = types.Getenv("SPOOF_MAP", "")
experimentDetails.MatchScheme = types.Getenv("MATCH_SCHEME", "exact")
experimentDetails.ChaosType = types.Getenv("CHAOS_TYPE", "error")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
ContainerId string
Pid int
CommandPid int
Cmd *exec.Cmd
Source string
}

View File

@ -0,0 +1,253 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-dns-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodDNSFault")
defer span.End()
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode inject the DNS Chaos in all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDNSFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform DNS Chaos
for _, pod := range targetPodList.Items {
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode inject the DNS Chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDNSFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreatePodDNSFaultHelperPod")
defer span.End()
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
Volumes: []apiv1.Volume{
{
Name: "cri-socket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
},
Args: []string{
"-c",
"./helpers -name dns-chaos",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
MountPath: experimentsDetails.SocketPath,
},
},
SecurityContext: &apiv1.SecurityContext{
Privileged: &privilegedEnable,
},
},
},
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
SetEnv("CHAOS_UID", string(experimentsDetails.ChaosUID)).
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("TARGET_HOSTNAMES", experimentsDetails.TargetHostNames).
SetEnv("SPOOF_MAP", experimentsDetails.SpoofMap).
SetEnv("MATCH_SCHEME", experimentsDetails.MatchScheme).
SetEnv("CHAOS_TYPE", experimentsDetails.ChaosType).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
}

View File

@ -0,0 +1,308 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-fio-stress/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
)
// PrepareChaos contains the chaos preparation and injection steps
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodFIOStressFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Fio stress experiment
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not inject chaos")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// stressStorage uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the storage utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
func stressStorage(experimentDetails *experimentTypes.ExperimentDetails, podName, ns string, clients clients.ClientSets, stressErr chan error) {
log.Infof("The storage consumption is: %vM", experimentDetails.Size)
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
fioCmd := fmt.Sprintf("fio --name=testchaos --ioengine=%v --iodepth=%v --rw=%v --bs=%v --size=%vM --numjobs=%v", experimentDetails.IOEngine, experimentDetails.IODepth, experimentDetails.ReadWrite, experimentDetails.BlockSize, experimentDetails.Size, experimentDetails.NumJobs)
if experimentDetails.GroupReporting {
fioCmd += " --group_reporting"
}
log.Infof("Running the command:\n%v", fioCmd)
command := []string{"/bin/sh", "-c", fioCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentDetails.TargetContainer, ns)
_, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err
}
// experimentExecution function orchestrates the experiment by calling the StressStorage function, of every container, of every pod that is targeted
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode stressed the storage of all target application in serial mode (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodFIOStressFaultInSerialMode")
defer span.End()
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
for _, pod := range targetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"Space Consumption(MB)": experimentsDetails.Size,
})
go stressStorage(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
loop:
for {
endTime = time.After(timeDelay)
select {
case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if resource to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed")
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("podName: %s, namespace: %s, container: %s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: fmt.Sprintf("failed to stress cpu of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressSerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
if err := killStressSerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
}
return nil
}
// injectChaosInParallelMode stressed the storage of all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodFIOStressFaultInParallelMode")
defer span.End()
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
for _, pod := range targetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"Storage Consumption(MB)": experimentsDetails.Size,
})
go stressStorage(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
}
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
loop:
for {
endTime = time.After(timeDelay)
select {
case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if resource to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed")
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to injcet chaos: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
break loop
}
}
if err := killStressParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients); err != nil {
return stacktrace.Propagate(err, "could revert chaos")
}
return nil
}
// killStressSerial function to kill a stress process running inside target container
//
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressSerial(containerName, podName, namespace, KillCmd string, clients clients.ClientSets) error {
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", KillCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
out, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, namespace), Reason: fmt.Sprintf("failed to revert chaos: %s", out)}
}
return nil
}
// killStressParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressParallel(containerName string, targetPodList corev1.PodList, KillCmd string, clients clients.ClientSets) error {
var errList []string
for _, pod := range targetPodList.Items {
if err := killStressSerial(containerName, pod.Name, pod.Namespace, KillCmd, clients); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}

View File

@ -0,0 +1,334 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strconv"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-memory-hog-exec/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
)
var inject chan os.Signal
// PrepareMemoryExecStress contains the chaos preparation and injection steps
func PrepareMemoryExecStress(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodMemoryHogExecFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Memory stress experiment
if err := experimentMemory(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not stress memory")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// stressMemory Uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the Memory utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
func stressMemory(MemoryConsumption, containerName, podName, namespace string, clients clients.ClientSets, stressErr chan error) {
log.Infof("The memory consumption is: %v", MemoryConsumption)
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
ddCmd := fmt.Sprintf("dd if=/dev/zero of=/dev/null bs=" + MemoryConsumption + "M")
command := []string{"/bin/sh", "-c", ddCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err
}
// experimentMemory function orchestrates the experiment by calling the StressMemory function, of every container, of every pod that is targeted
func experimentMemory(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode stressed the memory of all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodMemoryHogExecFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
select {
case <-inject:
// stopping the chaos execution, if abort signal received
time.Sleep(10 * time.Second)
os.Exit(0)
default:
for _, pod := range targetPodList.Items {
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"Memory Consumption(MB)": experimentsDetails.MemoryConsumption,
})
go stressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, pod.Namespace, clients, stressErr)
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
loop:
for {
endTime = time.After(timeDelay)
select {
case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if memory to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed")
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("podName: %s, namespace: %s, container: %s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: fmt.Sprintf("failed to stress memory of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
// updating the chaosresult after stopped
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
if err := killStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not revert memory stress")
}
}
}
return nil
}
// injectChaosInParallelMode stressed the memory of all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodMemoryHogExecFaultInParallelMode")
defer span.End()
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
select {
case <-inject:
// stopping the chaos execution, if abort signal received
time.Sleep(10 * time.Second)
os.Exit(0)
default:
for _, pod := range targetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
//Get the target container name of the application pod
//It checks the empty target container for the first iteration only
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"Memory Consumption(MB)": experimentsDetails.MemoryConsumption,
})
go stressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, pod.Namespace, clients, stressErr)
}
}
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
loop:
for {
endTime = time.After(timeDelay)
select {
case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if memory to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed")
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to stress memory of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
// updating the chaosresult after stopped
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
break loop
}
}
return killStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients, chaosDetails)
}
// killStressMemorySerial function to kill a stress process running inside target container
//
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressMemorySerial(containerName, podName, namespace, memFreeCmd string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// It will contains all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", memFreeCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
out, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, namespace), Reason: fmt.Sprintf("failed to revert chaos: %s", out)}
}
common.SetTargets(podName, "reverted", "pod", chaosDetails)
return nil
}
// killStressMemoryParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressMemoryParallel(containerName string, targetPodList corev1.PodList, memFreeCmd string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
var errList []string
for _, pod := range targetPodList.Items {
if err := killStressMemorySerial(containerName, pod.Name, pod.Namespace, memFreeCmd, clients, chaosDetails); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}

View File

@ -1,301 +0,0 @@
package lib
import (
"fmt"
"os"
"os/signal"
"strconv"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-memory-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/klog"
)
var err error
// StressMemory Uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the Memory utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
func StressMemory(MemoryConsumption, containerName, podName, namespace string, clients clients.ClientSets, stressErr chan error) {
log.Infof("The memory consumption is: %v", MemoryConsumption)
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
ddCmd := fmt.Sprintf("dd if=/dev/zero of=/dev/null bs=" + MemoryConsumption + "M")
command := []string{"/bin/sh", "-c", ddCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err
}
//ExperimentMemory function orchestrates the experiment by calling the StressMemory function, of every container, of every pod that is targeted
func ExperimentMemory(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = GetTargetContainer(experimentsDetails, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("Unable to get the target container name, err: %v", err)
}
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
} else {
if err = InjectChaosInParallelMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
}
return nil
}
// InjectChaosInSerialMode stressed the memory of all target application serially (one by one)
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// creating err channel to recieve the error from the go routine
stressErr := make(chan error)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
for _, pod := range targetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"Memory Consumption(MB)": experimentsDetails.MemoryConsumption,
})
go StressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, clients, stressErr)
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
loop:
for {
endTime = time.After(timeDelay)
select {
case err := <-stressErr:
// skipping the execution, if recieved any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if memory to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
return nil
}
return err
}
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
err = KillStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients)
if err != nil {
klog.V(0).Infof("Error in Kill stress after abortion")
return err
}
// updating the chaosresult after stopped
failStep := "Memory hog Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
// generating summary event in chaosengine
msg := experimentsDetails.ExperimentName + " experiment has been aborted"
types.SetEngineEventAttributes(eventsDetails, types.StoppedVerdict, msg, "Warning", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
// generating summary event in chaosresult
types.SetResultEventAttributes(eventsDetails, types.Summary, msg, "Warning", resultDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosResult")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
if err = KillStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil {
return err
}
}
return nil
}
// InjectChaosInParallelMode stressed the memory of all target application in parallel mode (all at once)
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// creating err channel to recieve the error from the go routine
stressErr := make(chan error)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
for _, pod := range targetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Container": experimentsDetails.TargetContainer,
"Target Pod": pod.Name,
"Memory Consumption(MB)": experimentsDetails.MemoryConsumption,
})
go StressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, clients, stressErr)
}
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM, syscall.SIGKILL)
loop:
for {
endTime = time.After(timeDelay)
select {
case err := <-stressErr:
// skipping the execution, if recieved any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if memory to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
return nil
}
return err
}
case <-signChan:
log.Info("[Chaos]: Killing process started because of terminated signal received")
err = KillStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients)
if err != nil {
klog.V(0).Infof("Error in Kill stress after abortion")
return err
}
// updating the chaosresult after stopped
failStep := "Memory hog Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
// generating summary event in chaosengine
msg := experimentsDetails.ExperimentName + " experiment has been aborted"
types.SetEngineEventAttributes(eventsDetails, types.StoppedVerdict, msg, "Warning", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
// generating summary event in chaosresult
types.SetResultEventAttributes(eventsDetails, types.Summary, msg, "Warning", resultDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosResult")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
break loop
}
}
if err = KillStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil {
return err
}
return nil
}
//PrepareMemoryStress contains the steps for prepration before chaos
func PrepareMemoryStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Memory stress experiment
err := ExperimentMemory(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails)
if err != nil {
return err
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
//GetTargetContainer will fetch the container name from application pod
// It will return the first container name from the application pod
func GetTargetContainer(experimentsDetails *experimentTypes.ExperimentDetails, appName string, clients clients.ClientSets) (string, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(appName, v1.GetOptions{})
if err != nil {
return "", err
}
return pod.Spec.Containers[0].Name, nil
}
// KillStressMemorySerial function to kill a stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func KillStressMemorySerial(containerName, podName, namespace, memFreeCmd string, clients clients.ClientSets) error {
// It will contains all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", memFreeCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return errors.Errorf("Unable to kill stress process inside target container, err: %v", err)
}
return nil
}
// KillStressMemoryParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func KillStressMemoryParallel(containerName string, targetPodList corev1.PodList, namespace, memFreeCmd string, clients clients.ClientSets) error {
for _, pod := range targetPodList.Items {
if err := KillStressMemorySerial(containerName, pod.Name, namespace, memFreeCmd, clients); err != nil {
return err
}
}
return nil
}

View File

@ -0,0 +1,297 @@
package lib
import (
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/palantir/stacktrace"
"strings"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-network-partition/types"
"gopkg.in/yaml.v2"
corev1 "k8s.io/api/core/v1"
networkv1 "k8s.io/api/networking/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/intstr"
)
const (
// AllIPs cidr contains all ips
AllIPs string = "0.0.0.0/0"
)
// NetworkPolicy contains details about the network-policy
type NetworkPolicy struct {
TargetPodLabels map[string]string
PolicyType []networkv1.PolicyType
Egress []networkv1.NetworkPolicyEgressRule
Ingress []networkv1.NetworkPolicyIngressRule
ExceptIPs []string
NamespaceSelector map[string]string
PodSelector map[string]string
Ports []networkv1.NetworkPolicyPort
}
// Port contains the port details
type Port struct {
TCP []int32 `json:"tcp"`
UDP []int32 `json:"udp"`
SCTP []int32 `json:"sctp"`
}
// initialize creates an instance of network policy struct
func initialize() *NetworkPolicy {
return &NetworkPolicy{}
}
// getNetworkPolicyDetails collects all the data required for network policy
func (np *NetworkPolicy) getNetworkPolicyDetails(experimentsDetails *experimentTypes.ExperimentDetails) error {
np.setLabels(experimentsDetails.AppLabel).
setPolicy(experimentsDetails.PolicyTypes).
setPodSelector(experimentsDetails.PodSelector).
setNamespaceSelector(experimentsDetails.NamespaceSelector)
// sets the ports for the traffic control
if err := np.setPort(experimentsDetails.PORTS); err != nil {
return stacktrace.Propagate(err, "could not set port")
}
// sets the destination ips for which the traffic should be blocked
if err := np.setExceptIPs(experimentsDetails); err != nil {
return stacktrace.Propagate(err, "could not set ips")
}
// sets the egress traffic rules
if strings.ToLower(experimentsDetails.PolicyTypes) == "egress" || strings.ToLower(experimentsDetails.PolicyTypes) == "all" {
np.setEgressRules()
}
// sets the ingress traffic rules
if strings.ToLower(experimentsDetails.PolicyTypes) == "ingress" || strings.ToLower(experimentsDetails.PolicyTypes) == "all" {
np.setIngressRules()
}
return nil
}
// setLabels sets the target application label
func (np *NetworkPolicy) setLabels(appLabel string) *NetworkPolicy {
key, value := getKeyValue(appLabel)
if key != "" || value != "" {
np.TargetPodLabels = map[string]string{
key: value,
}
}
return np
}
// getKeyValue returns the key & value from the label
func getKeyValue(label string) (string, string) {
labels := strings.Split(label, "=")
switch {
case len(labels) == 2:
return labels[0], labels[1]
default:
return labels[0], ""
}
}
// setPolicy sets the network policy types
func (np *NetworkPolicy) setPolicy(policy string) *NetworkPolicy {
switch strings.ToLower(policy) {
case "ingress":
np.PolicyType = []networkv1.PolicyType{networkv1.PolicyTypeIngress}
case "egress":
np.PolicyType = []networkv1.PolicyType{networkv1.PolicyTypeEgress}
default:
np.PolicyType = []networkv1.PolicyType{networkv1.PolicyTypeEgress, networkv1.PolicyTypeIngress}
}
return np
}
// setPodSelector sets the pod labels selector
func (np *NetworkPolicy) setPodSelector(podLabel string) *NetworkPolicy {
podSelector := map[string]string{}
labels := strings.Split(podLabel, ",")
for i := range labels {
key, value := getKeyValue(labels[i])
if key != "" || value != "" {
podSelector[key] = value
}
}
np.PodSelector = podSelector
return np
}
// setNamespaceSelector sets the namespace labels selector
func (np *NetworkPolicy) setNamespaceSelector(nsLabel string) *NetworkPolicy {
nsSelector := map[string]string{}
labels := strings.Split(nsLabel, ",")
for i := range labels {
key, value := getKeyValue(labels[i])
if key != "" || value != "" {
nsSelector[key] = value
}
}
np.NamespaceSelector = nsSelector
return np
}
// setPort sets all the protocols and ports
func (np *NetworkPolicy) setPort(p string) error {
var ports []networkv1.NetworkPolicyPort
var port Port
// unmarshal the protocols and ports from the env
if err := yaml.Unmarshal([]byte(strings.TrimSpace(parseCommand(p))), &port); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("failed to unmarshal ports: %s", err.Error())}
}
// sets all the tcp ports
for _, p := range port.TCP {
ports = append(ports, getPort(p, corev1.ProtocolTCP))
}
// sets all the udp ports
for _, p := range port.UDP {
ports = append(ports, getPort(p, corev1.ProtocolUDP))
}
// sets all the sctp ports
for _, p := range port.SCTP {
ports = append(ports, getPort(p, corev1.ProtocolSCTP))
}
np.Ports = ports
return nil
}
// getPort return the port details
func getPort(port int32, protocol corev1.Protocol) networkv1.NetworkPolicyPort {
networkPorts := networkv1.NetworkPolicyPort{
Protocol: &protocol,
Port: &intstr.IntOrString{
Type: intstr.Int,
IntVal: port,
},
}
return networkPorts
}
// setExceptIPs sets all the destination ips
// for which traffic should be blocked
func (np *NetworkPolicy) setExceptIPs(experimentsDetails *experimentTypes.ExperimentDetails) error {
// get all the target ips
destinationIPs, err := network_chaos.GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients.ClientSets{}, false)
if err != nil {
return stacktrace.Propagate(err, "could not get destination ips")
}
ips := strings.Split(destinationIPs, ",")
var uniqueIps []string
// removing all the duplicates and ipv6 ips from the list, if any
for i := range ips {
isPresent := false
for j := range uniqueIps {
if ips[i] == uniqueIps[j] {
isPresent = true
}
}
if ips[i] != "" && !isPresent && !strings.Contains(ips[i], ":") {
uniqueIps = append(uniqueIps, ips[i]+"/32")
}
}
np.ExceptIPs = uniqueIps
return nil
}
// setIngressRules sets the ingress traffic rules
func (np *NetworkPolicy) setIngressRules() *NetworkPolicy {
if len(np.getPeers()) != 0 || len(np.Ports) != 0 {
np.Ingress = []networkv1.NetworkPolicyIngressRule{
{
From: np.getPeers(),
Ports: np.Ports,
},
}
}
return np
}
// setEgressRules sets the egress traffic rules
func (np *NetworkPolicy) setEgressRules() *NetworkPolicy {
if len(np.getPeers()) != 0 || len(np.Ports) != 0 {
np.Egress = []networkv1.NetworkPolicyEgressRule{
{
To: np.getPeers(),
Ports: np.Ports,
},
}
}
return np
}
// getPeers return the peer's ips, namespace selectors, and pod selectors
func (np *NetworkPolicy) getPeers() []networkv1.NetworkPolicyPeer {
var peers []networkv1.NetworkPolicyPeer
// sets the namespace selectors
if np.NamespaceSelector != nil && len(np.NamespaceSelector) != 0 {
peers = append(peers, np.getNamespaceSelector())
}
// sets the pod selectors
if np.PodSelector != nil && len(np.PodSelector) != 0 {
peers = append(peers, np.getPodSelector())
}
// sets the ipblocks
if np.ExceptIPs != nil && len(np.ExceptIPs) != 0 {
peers = append(peers, np.getIPBlocks())
}
return peers
}
// getNamespaceSelector builds the namespace selector
func (np *NetworkPolicy) getNamespaceSelector() networkv1.NetworkPolicyPeer {
nsSelector := networkv1.NetworkPolicyPeer{
NamespaceSelector: &v1.LabelSelector{
MatchLabels: np.NamespaceSelector,
},
}
return nsSelector
}
// getPodSelector builds the pod selectors
func (np *NetworkPolicy) getPodSelector() networkv1.NetworkPolicyPeer {
podSelector := networkv1.NetworkPolicyPeer{
PodSelector: &v1.LabelSelector{
MatchLabels: np.PodSelector,
},
}
return podSelector
}
// getIPBlocks builds the ipblocks
func (np *NetworkPolicy) getIPBlocks() networkv1.NetworkPolicyPeer {
ipBlocks := networkv1.NetworkPolicyPeer{
IPBlock: &networkv1.IPBlock{
CIDR: AllIPs,
Except: np.ExceptIPs,
},
}
return ipBlocks
}
// parseCommand parse the protocols and ports
func parseCommand(command string) string {
final := ""
c := strings.Split(command, ", ")
for i := range c {
final = final + strings.TrimSpace(c[i]) + "\n"
}
return final
}

View File

@ -0,0 +1,260 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-network-partition/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
networkv1 "k8s.io/api/networking/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var (
inject, abort chan os.Signal
)
// PrepareAndInjectChaos contains the prepration & injection steps
func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkPartitionFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// validate the appLabels
if chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide the appLabel"}
}
// Get the target pod details for the chaos execution
targetPodList, err := common.GetPodList("", 100, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
// generate a unique string
runID := stringutils.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// collect all the data for the network policy
np := initialize()
if err := np.getNetworkPolicyDetails(experimentsDetails); err != nil {
return stacktrace.Propagate(err, "could not get network policy details")
}
//DISPLAY THE NETWORK POLICY DETAILS
log.InfoWithValues("The Network policy details are as follows", logrus.Fields{
"Target Label": np.TargetPodLabels,
"Policy Type": np.PolicyType,
"PodSelector": np.PodSelector,
"NamespaceSelector": np.NamespaceSelector,
"Destination IPs": np.ExceptIPs,
"Ports": np.Ports,
})
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, clients, chaosDetails, resultDetails, &targetPodList, runID)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// creating the network policy to block the traffic
if err := createNetworkPolicy(ctx, experimentsDetails, clients, np, runID); err != nil {
return stacktrace.Propagate(err, "could not create network policy")
}
// updating chaos status to injected for the target pods
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
}
}
// verify the presence of network policy inside cluster
if err := checkExistenceOfPolicy(experimentsDetails, clients, experimentsDetails.Timeout, experimentsDetails.Delay, runID); err != nil {
return stacktrace.Propagate(err, "could not check existence of network policy")
}
log.Infof("[Wait]: Wait for %v chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
// deleting the network policy after chaos duration over
if err := deleteNetworkPolicy(experimentsDetails, clients, &targetPodList, chaosDetails, experimentsDetails.Timeout, experimentsDetails.Delay, runID); err != nil {
return stacktrace.Propagate(err, "could not delete network policy")
}
// updating chaos status to reverted for the target pods
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// createNetworkPolicy creates the network policy in the application namespace
// it blocks ingress/egress traffic for the targeted application for specific/all IPs
func createNetworkPolicy(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, networkPolicy *NetworkPolicy, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodNetworkPartitionFault")
defer span.End()
np := &networkv1.NetworkPolicy{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-np-" + runID,
Namespace: experimentsDetails.AppNS,
Labels: map[string]string{
"name": experimentsDetails.ExperimentName + "-np-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
},
},
Spec: networkv1.NetworkPolicySpec{
PodSelector: v1.LabelSelector{
MatchLabels: networkPolicy.TargetPodLabels,
},
PolicyTypes: networkPolicy.PolicyType,
Egress: networkPolicy.Egress,
Ingress: networkPolicy.Ingress,
},
}
_, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).Create(context.Background(), np, v1.CreateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to create network policy: %s", err.Error())}
}
return nil
}
// deleteNetworkPolicy deletes the network policy and wait until the network policy deleted completely
func deleteNetworkPolicy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, targetPodList *corev1.PodList, chaosDetails *types.ChaosDetails, timeout, delay int, runID string) error {
name := experimentsDetails.ExperimentName + "-np-" + runID
labels := "name=" + experimentsDetails.ExperimentName + "-np-" + runID
if err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).Delete(context.Background(), name, v1.DeleteOptions{}); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{name: %s, namespace: %s}", name, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to delete network policy: %s", err.Error())}
}
err := retry.
Times(uint(timeout / delay)).
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
npList, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).List(context.Background(), v1.ListOptions{LabelSelector: labels})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to list network policies: %s", err.Error())}
} else if len(npList.Items) != 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: "network policies are not deleted within timeout"}
}
return nil
})
if err != nil {
return err
}
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
return nil
}
// checkExistenceOfPolicy validate the presence of network policy inside the application namespace
func checkExistenceOfPolicy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, timeout, delay int, runID string) error {
labels := "name=" + experimentsDetails.ExperimentName + "-np-" + runID
return retry.
Times(uint(timeout / delay)).
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
npList, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).List(context.Background(), v1.ListOptions{LabelSelector: labels})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to list network policies: %s", err.Error())}
} else if len(npList.Items) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: "no network policy found with matching labels"}
}
return nil
})
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, targetPodList *corev1.PodList, runID string) {
// waiting till the abort signal received
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
if err := checkExistenceOfPolicy(experimentsDetails, clients, 2, 1, runID); err != nil {
if error, ok := err.(cerrors.Error); ok {
if strings.Contains(error.Reason, "no network policy found with matching labels") {
break
}
}
log.Infof("no active network policy found, err: %v", err.Error())
retry--
continue
}
if err := deleteNetworkPolicy(experimentsDetails, clients, targetPodList, chaosDetails, 2, 1, runID); err != nil {
log.Errorf("unable to delete network policy, err: %v", err)
}
retry--
}
// updating the chaosresult after stopped
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
log.Info("Chaos Revert Completed")
os.Exit(0)
}

View File

@ -0,0 +1,260 @@
package lib
import (
"fmt"
"go.opentelemetry.io/otel"
"golang.org/x/net/context"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/rds"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/rds-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
)
var (
err error
inject, abort chan os.Signal
)
func PrepareRDSInstanceStop(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareRDSInstanceStop")
defer span.End()
// Inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// Abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Get the instance identifier or list of instance identifiers
instanceIdentifierList := strings.Split(experimentsDetails.RDSInstanceIdentifier, ",")
if experimentsDetails.RDSInstanceIdentifier == "" || len(instanceIdentifierList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no RDS instance identifier found to stop"}
}
instanceIdentifierList = common.FilterBasedOnPercentage(experimentsDetails.InstanceAffectedPerc, instanceIdentifierList)
log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceIdentifierList))
// Watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, instanceIdentifierList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceIdentifierList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceIdentifierList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the rds instance state in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
select {
case <-inject:
// Stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instance identifier list, %v", instanceIdentifierList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on rds instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i, identifier := range instanceIdentifierList {
// Stopping the RDS instance
log.Info("[Chaos]: Stopping the desired RDS instance")
if err := awslib.RDSInstanceStop(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "injected", "RDS", chaosDetails)
// Wait for rds instance to completely stop
log.Infof("[Wait]: Wait for RDS instance '%v' to get in stopped state", identifier)
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
// Run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
// Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
// Starting the RDS instance
log.Info("[Chaos]: Starting back the RDS instance")
if err = awslib.RDSInstanceStart(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
// Wait for rds instance to get in available state
log.Infof("[Wait]: Wait for RDS instance '%v' to get in available state", identifier)
if err := awslib.WaitForRDSInstanceUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode will inject the rds instance termination in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instance identifier list, %v", instanceIdentifierList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on rds instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// PowerOff the instance
for _, identifier := range instanceIdentifierList {
// Stopping the RDS instance
log.Info("[Chaos]: Stopping the desired RDS instance")
if err := awslib.RDSInstanceStop(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "injected", "RDS", chaosDetails)
}
for _, identifier := range instanceIdentifierList {
// Wait for rds instance to completely stop
log.Infof("[Wait]: Wait for RDS instance '%v' to get in stopped state", identifier)
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
// Run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
// Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
// Starting the RDS instance
for _, identifier := range instanceIdentifierList {
log.Info("[Chaos]: Starting back the RDS instance")
if err = awslib.RDSInstanceStart(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
}
for _, identifier := range instanceIdentifierList {
// Wait for rds instance to get in available state
log.Infof("[Wait]: Wait for RDS instance '%v' to get in available state", identifier)
if err := awslib.WaitForRDSInstanceUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
}
for _, identifier := range instanceIdentifierList {
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// watching for the abort signal and revert the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for _, identifier := range instanceIdentifierList {
instanceState, err := awslib.GetRDSInstanceStatus(identifier, experimentsDetails.Region)
if err != nil {
log.Errorf("Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "running" {
log.Info("[Abort]: Waiting for the RDS instance to get down")
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
log.Errorf("Unable to wait till stop of the instance: %v", err)
}
log.Info("[Abort]: Starting RDS instance as abort signal received")
err := awslib.RDSInstanceStart(identifier, experimentsDetails.Region)
if err != nil {
log.Errorf("RDS instance failed to start when an abort signal is received: %v", err)
}
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -0,0 +1,76 @@
package lib
import (
"context"
"fmt"
"time"
redfishLib "github.com/litmuschaos/litmus-go/pkg/baremetal/redfish"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/baremetal/redfish-node-restart/types"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
// injectChaos initiates node restart chaos on the target node
func injectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectRedfishNodeRestartFault")
defer span.End()
URL := fmt.Sprintf("https://%v/redfish/v1/Systems/System.Embedded.1/Actions/ComputerSystem.Reset", experimentsDetails.IPMIIP)
return redfishLib.RebootNode(URL, experimentsDetails.User, experimentsDetails.Password)
}
// experimentExecution function orchestrates the experiment by calling the injectChaos function
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + experimentsDetails.IPMIIP + " node"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if err := injectChaos(ctx, experimentsDetails, clients); err != nil {
return stacktrace.Propagate(err, "chaos injection failed")
}
log.Infof("[Chaos]: Waiting for: %vs", experimentsDetails.ChaosDuration)
time.Sleep(time.Duration(experimentsDetails.ChaosDuration) * time.Second)
return nil
}
// PrepareChaos contains the chaos prepration and injection steps
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareRedfishNodeRestartFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Redfish node restart experiment
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
common.SetTargets(experimentsDetails.IPMIIP, "targeted", "node", chaosDetails)
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}

View File

@ -0,0 +1,403 @@
package lib
import (
"bytes"
"context"
"encoding/json"
"fmt"
"net/http"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
corev1 "k8s.io/api/core/v1"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/spring-boot/spring-boot-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/sirupsen/logrus"
)
var revertAssault = experimentTypes.ChaosMonkeyAssaultRevert{
LatencyActive: false,
KillApplicationActive: false,
CPUActive: false,
MemoryActive: false,
ExceptionsActive: false,
}
// SetTargetPodList selects the targeted pod and add them to the experimentDetails
func SetTargetPodList(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
var err error
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or TARGET_PODS"}
}
if experimentsDetails.TargetPodList, err = common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails); err != nil {
return err
}
return nil
}
// PrepareChaos contains the preparation steps before chaos injection
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareSpringBootFault")
defer span.End()
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
log.InfoWithValues("[Info]: Chaos monkeys watchers will be injected to the target pods as follows", logrus.Fields{
"WebClient": experimentsDetails.ChaosMonkeyWatchers.WebClient,
"Service": experimentsDetails.ChaosMonkeyWatchers.Service,
"Component": experimentsDetails.ChaosMonkeyWatchers.Component,
"Repository": experimentsDetails.ChaosMonkeyWatchers.Repository,
"Controller": experimentsDetails.ChaosMonkeyWatchers.Controller,
"RestController": experimentsDetails.ChaosMonkeyWatchers.RestController,
})
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// CheckChaosMonkey verifies if chaos monkey for spring boot is available in the selected pods
// All pods are checked, even if some errors occur. But in case of one pod in error, the check will be in error
func CheckChaosMonkey(chaosMonkeyPort string, chaosMonkeyPath string, targetPods corev1.PodList) (bool, error) {
hasErrors := false
targetPodNames := []string{}
for _, pod := range targetPods.Items {
targetPodNames = append(targetPodNames, pod.Name)
endpoint := "http://" + pod.Status.PodIP + ":" + chaosMonkeyPort + chaosMonkeyPath
log.Infof("[Check]: Checking pod: %v (endpoint: %v)", pod.Name, endpoint)
resp, err := http.Get(endpoint)
if err != nil {
log.Errorf("failed to request chaos monkey endpoint on pod %s, %s", pod.Name, err.Error())
hasErrors = true
continue
}
if resp.StatusCode != 200 {
log.Errorf("failed to get chaos monkey endpoint on pod %s (status: %d)", pod.Name, resp.StatusCode)
hasErrors = true
}
}
if hasErrors {
return false, cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{podNames: %s}", targetPodNames), Reason: "failed to check chaos monkey on at least one pod, check logs for details"}
}
return true, nil
}
// enableChaosMonkey enables chaos monkey on selected pods
func enableChaosMonkey(chaosMonkeyPort string, chaosMonkeyPath string, pod corev1.Pod) error {
log.Infof("[Chaos]: Enabling Chaos Monkey on pod: %v", pod.Name)
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/enable", "", nil) //nolint:bodyclose
if err != nil {
return err
}
if resp.StatusCode != 200 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to enable chaos monkey endpoint (status: %d)", resp.StatusCode)}
}
return nil
}
func setChaosMonkeyWatchers(chaosMonkeyPort string, chaosMonkeyPath string, watchers experimentTypes.ChaosMonkeyWatchers, pod corev1.Pod) error {
log.Infof("[Chaos]: Setting Chaos Monkey watchers on pod: %v", pod.Name)
jsonValue, err := json.Marshal(watchers)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to marshal chaos monkey watchers, %s", err.Error())}
}
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/watchers", "application/json", bytes.NewBuffer(jsonValue))
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to set watchers, %s", err.Error())}
}
if resp.StatusCode != 200 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to set assault (status: %d)", resp.StatusCode)}
}
return nil
}
func startAssault(chaosMonkeyPort string, chaosMonkeyPath string, assault []byte, pod corev1.Pod) error {
if err := setChaosMonkeyAssault(chaosMonkeyPort, chaosMonkeyPath, assault, pod); err != nil {
return err
}
log.Infof("[Chaos]: Activating Chaos Monkey assault on pod: %v", pod.Name)
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/assaults/runtime/attack", "", nil)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to start assault %s", err.Error())}
}
if resp.StatusCode != 200 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to activate runtime attack (status: %d)", resp.StatusCode)}
}
return nil
}
func setChaosMonkeyAssault(chaosMonkeyPort string, chaosMonkeyPath string, assault []byte, pod corev1.Pod) error {
log.Infof("[Chaos]: Setting Chaos Monkey assault on pod: %v", pod.Name)
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/assaults", "application/json", bytes.NewBuffer(assault))
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to set assault, %s", err.Error())}
}
if resp.StatusCode != 200 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to set assault (status: %d)", resp.StatusCode)}
}
return nil
}
// disableChaosMonkey disables chaos monkey on selected pods
func disableChaosMonkey(ctx context.Context, chaosMonkeyPort string, chaosMonkeyPath string, pod corev1.Pod) error {
log.Infof("[Chaos]: disabling assaults on pod %s", pod.Name)
jsonValue, err := json.Marshal(revertAssault)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to marshal chaos monkey revert-chaos watchers, %s", err.Error())}
}
if err := setChaosMonkeyAssault(chaosMonkeyPort, chaosMonkeyPath, jsonValue, pod); err != nil {
return err
}
log.Infof("[Chaos]: disabling chaos monkey on pod %s", pod.Name)
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/disable", "", nil)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to disable assault, %s", err.Error())}
}
if resp.StatusCode != 200 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to disable chaos monkey endpoint (status: %d)", resp.StatusCode)}
}
return nil
}
// injectChaosInSerialMode injects chaos monkey assault on pods in serial mode(one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectSpringBootFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
select {
case <-signChan:
// stopping the chaos execution, if abort signal received
time.Sleep(10 * time.Second)
os.Exit(0)
default:
for _, pod := range experimentsDetails.TargetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
_ = events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Chaos]: Injecting on target pod", logrus.Fields{
"Target Pod": pod.Name,
})
if err := setChaosMonkeyWatchers(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyWatchers, pod); err != nil {
log.Errorf("[Chaos]: Failed to set watchers, err: %v ", err)
return err
}
if err := startAssault(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyAssault, pod); err != nil {
log.Errorf("[Chaos]: Failed to set assault, err: %v ", err)
return err
}
if err := enableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("[Chaos]: Failed to enable chaos, err: %v ", err)
return err
}
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
log.Infof("[Chaos]: Waiting for: %vs", experimentsDetails.ChaosDuration)
endTime = time.After(timeDelay)
loop:
for {
select {
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("Error in disabling chaos monkey, err: %v", err)
} else {
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, cerrors.ErrorTypeExperimentAborted)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
return err
}
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
}
return nil
}
// injectChaosInParallelMode injects chaos monkey assault on pods in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectSpringBootFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// signChan channel is used to transmit signal notifications.
signChan := make(chan os.Signal, 1)
// Catch and relay certain signal(s) to signChan channel.
signal.Notify(signChan, os.Interrupt, syscall.SIGTERM)
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
select {
case <-signChan:
// stopping the chaos execution, if abort signal received
time.Sleep(10 * time.Second)
os.Exit(0)
default:
for _, pod := range experimentsDetails.TargetPodList.Items {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
_ = events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
"Target Pod": pod.Name,
})
if err := setChaosMonkeyWatchers(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyWatchers, pod); err != nil {
log.Errorf("[Chaos]: Failed to set watchers, err: %v", err)
return err
}
if err := startAssault(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyAssault, pod); err != nil {
log.Errorf("[Chaos]: Failed to set assault, err: %v", err)
return err
}
if err := enableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("[Chaos]: Failed to enable chaos, err: %v", err)
return err
}
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
}
log.Infof("[Chaos]: Waiting for: %vs", experimentsDetails.ChaosDuration)
}
loop:
for {
endTime = time.After(timeDelay)
select {
case <-signChan:
log.Info("[Chaos]: Revert Started")
for _, pod := range experimentsDetails.TargetPodList.Items {
if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("Error in disabling chaos monkey, err: %v", err)
} else {
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
}
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, cerrors.ErrorTypeExperimentAborted)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
log.Infof("[Chaos]: Time is up for experiment: %v", experimentsDetails.ExperimentName)
endTime = nil
break loop
}
}
var errorList []string
for _, pod := range experimentsDetails.TargetPodList.Items {
if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
errorList = append(errorList, err.Error())
continue
}
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
if len(errorList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("error in disabling chaos monkey, [%s]", strings.Join(errorList, ","))}
}
return nil
}

View File

@ -0,0 +1,697 @@
package helper
import (
"bufio"
"bytes"
"context"
"fmt"
"io"
"os"
"os/exec"
"os/signal"
"path/filepath"
"strconv"
"strings"
"syscall"
"time"
"github.com/containerd/cgroups"
cgroupsv2 "github.com/containerd/cgroups/v2"
"github.com/palantir/stacktrace"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientTypes "k8s.io/apimachinery/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
)
// list of cgroups in a container
var (
cgroupSubsystemList = []string{"cpu", "memory", "systemd", "net_cls",
"net_prio", "freezer", "blkio", "perf_event", "devices", "cpuset",
"cpuacct", "pids", "hugetlb",
}
)
var (
err error
inject, abort chan os.Signal
)
const (
// ProcessAlreadyFinished contains error code when process is finished
ProcessAlreadyFinished = "os: process already finished"
// ProcessAlreadyKilled contains error code when process is already killed
ProcessAlreadyKilled = "no such process"
)
// Helper injects the stress chaos
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodStressFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
resultDetails := types.ResultDetails{}
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Fetching all the ENV passed for the helper pod
log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails)
// Intialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
if err := prepareStressChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
// prepareStressChaos contains the chaos preparation and injection steps
func prepareStressChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
// get stressors in list format
stressorList := prepareStressor(experimentsDetails)
if len(stressorList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: "fail to prepare stressors"}
}
stressors := strings.Join(stressorList, " ")
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not parse targets")
}
var targets []*targetDetails
for _, t := range targetList.Target {
td := &targetDetails{
Name: t.Name,
Namespace: t.Namespace,
Source: chaosDetails.ChaosPodName,
}
td.TargetContainers, err = common.GetTargetContainers(t.Name, t.Namespace, t.TargetContainer, chaosDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get target containers")
}
td.ContainerIds, err = common.GetContainerIDs(td.Namespace, td.Name, td.TargetContainers, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container ids")
}
for _, cid := range td.ContainerIds {
// extract out the pid of the target container
pid, err := common.GetPID(experimentsDetails.ContainerRuntime, cid, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
}
td.Pids = append(td.Pids, pid)
}
for i := range td.Pids {
cGroupManagers, err, grpPath := getCGroupManager(td, i)
if err != nil {
return stacktrace.Propagate(err, "could not get cgroup manager")
}
td.GroupPath = grpPath
td.CGroupManagers = append(td.CGroupManagers, cGroupManagers)
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": td.Name,
"Namespace": td.Namespace,
"TargetContainers": td.TargetContainers,
})
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos if an abort signal is received
go abortWatcher(targets, resultDetails.Name, chaosDetails.ChaosNamespace)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
done := make(chan error, 1)
for index, t := range targets {
for i := range t.Pids {
cmd, err := injectChaos(t, stressors, i, experimentsDetails.StressType)
if err != nil {
if revertErr := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, index-1); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not inject chaos")
}
targets[index].Cmds = append(targets[index].Cmds, cmd)
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainers[i])
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, index); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.Info("[Wait]: Waiting for chaos completion")
// channel to check the completion of the stress process
go func() {
var errList []string
var exitErr error
for _, t := range targets {
for i := range t.Cmds {
if err := t.Cmds[i].Cmd.Wait(); err != nil {
log.Infof("stress process failed, err: %v, out: %v", err, t.Cmds[i].Buffer.String())
if _, ok := err.(*exec.ExitError); ok {
exitErr = err
continue
}
errList = append(errList, err.Error())
}
}
}
if exitErr != nil {
oomKilled, err := checkOOMKilled(targets, clients, exitErr)
if err != nil {
log.Infof("could not check oomkilled, err: %v", err)
}
if !oomKilled {
done <- exitErr
}
done <- nil
} else if len(errList) != 0 {
oomKilled, err := checkOOMKilled(targets, clients, fmt.Errorf("err: %v", strings.Join(errList, ", ")))
if err != nil {
log.Infof("could not check oomkilled, err: %v", err)
}
if !oomKilled {
done <- fmt.Errorf("err: %v", strings.Join(errList, ", "))
}
done <- nil
} else {
done <- nil
}
}()
// check the timeout for the command
// Note: timeout will occur when process didn't complete even after 10s of chaos duration
timeout := time.After((time.Duration(experimentsDetails.ChaosDuration) + 30) * time.Second)
select {
case <-timeout:
// the stress process gets timeout before completion
log.Infof("[Chaos] The stress process is not yet completed after the chaos duration of %vs", experimentsDetails.ChaosDuration+30)
log.Info("[Timeout]: Killing the stress process")
if err := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
case err := <-done:
if err != nil {
exitErr, ok := err.(*exec.ExitError)
if ok {
status := exitErr.Sys().(syscall.WaitStatus)
if status.Signaled() {
log.Infof("process stopped with signal: %v", status.Signal())
}
if status.Signaled() && status.Signal() == syscall.SIGKILL {
// wait for the completion of abort handler
time.Sleep(10 * time.Second)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Source: chaosDetails.ChaosPodName, Reason: fmt.Sprintf("process stopped with SIGTERM signal")}
}
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: chaosDetails.ChaosPodName, Reason: err.Error()}
}
log.Info("[Info]: Reverting Chaos")
if err := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
}
return nil
}
func revertChaosForAllTargets(targets []*targetDetails, resultDetails *types.ResultDetails, chaosNs string, index int) error {
var errList []string
for i := 0; i <= index; i++ {
if err := terminateProcess(targets[i]); err != nil {
errList = append(errList, err.Error())
continue
}
if err := result.AnnotateChaosResult(resultDetails.Name, chaosNs, "reverted", "pod", targets[i].Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// checkOOMKilled checks if the container within the target pods failed due to an OOMKilled error.
func checkOOMKilled(targets []*targetDetails, clients clients.ClientSets, chaosError error) (bool, error) {
// Check each container in the pod
for i := 0; i < 3; i++ {
for _, t := range targets {
// Fetch the target pod
targetPod, err := clients.KubeClient.CoreV1().Pods(t.Namespace).Get(context.Background(), t.Name, v1.GetOptions{})
if err != nil {
return false, cerrors.Error{
ErrorCode: cerrors.ErrorTypeStatusChecks,
Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace),
Reason: err.Error(),
}
}
for _, c := range targetPod.Status.ContainerStatuses {
if utils.Contains(c.Name, t.TargetContainers) {
// Check for OOMKilled and restart
if c.LastTerminationState.Terminated != nil && c.LastTerminationState.Terminated.ExitCode == 137 {
log.Warnf("[Warning]: The target container '%s' of pod '%s' got OOM Killed, err: %v", c.Name, t.Name, chaosError)
return true, nil
}
}
}
}
time.Sleep(1 * time.Second)
}
return false, nil
}
// terminateProcess will remove the stress process from the target container after chaos completion
func terminateProcess(t *targetDetails) error {
var errList []string
for i := range t.Cmds {
if t.Cmds[i] != nil && t.Cmds[i].Cmd.Process != nil {
if err := syscall.Kill(-t.Cmds[i].Cmd.Process.Pid, syscall.SIGKILL); err != nil {
if strings.Contains(err.Error(), ProcessAlreadyKilled) || strings.Contains(err.Error(), ProcessAlreadyFinished) {
continue
}
errList = append(errList, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[i]), Reason: fmt.Sprintf("failed to revert chaos: %s", err.Error())}.Error())
continue
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainers[i])
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// prepareStressor will set the required stressors for the given experiment
func prepareStressor(experimentDetails *experimentTypes.ExperimentDetails) []string {
stressArgs := []string{
"stress-ng",
"--timeout",
strconv.Itoa(experimentDetails.ChaosDuration) + "s",
}
switch experimentDetails.StressType {
case "pod-cpu-stress":
log.InfoWithValues("[Info]: Details of Stressor:", logrus.Fields{
"CPU Core": experimentDetails.CPUcores,
"CPU Load": experimentDetails.CPULoad,
"Timeout": experimentDetails.ChaosDuration,
})
stressArgs = append(stressArgs, "--cpu "+experimentDetails.CPUcores)
stressArgs = append(stressArgs, " --cpu-load "+experimentDetails.CPULoad)
case "pod-memory-stress":
log.InfoWithValues("[Info]: Details of Stressor:", logrus.Fields{
"Number of Workers": experimentDetails.NumberOfWorkers,
"Memory Consumption": experimentDetails.MemoryConsumption,
"Timeout": experimentDetails.ChaosDuration,
})
stressArgs = append(stressArgs, "--vm "+experimentDetails.NumberOfWorkers+" --vm-bytes "+experimentDetails.MemoryConsumption+"M")
case "pod-io-stress":
var hddbytes string
if experimentDetails.FilesystemUtilizationBytes == "0" {
if experimentDetails.FilesystemUtilizationPercentage == "0" {
hddbytes = "10%"
log.Info("Neither of FilesystemUtilizationPercentage or FilesystemUtilizationBytes provided, proceeding with a default FilesystemUtilizationPercentage value of 10%")
} else {
hddbytes = experimentDetails.FilesystemUtilizationPercentage + "%"
}
} else {
if experimentDetails.FilesystemUtilizationPercentage == "0" {
hddbytes = experimentDetails.FilesystemUtilizationBytes + "G"
} else {
hddbytes = experimentDetails.FilesystemUtilizationPercentage + "%"
log.Warn("Both FsUtilPercentage & FsUtilBytes provided as inputs, using the FsUtilPercentage value to proceed with stress exp")
}
}
log.InfoWithValues("[Info]: Details of Stressor:", logrus.Fields{
"io": experimentDetails.NumberOfWorkers,
"hdd": experimentDetails.NumberOfWorkers,
"hdd-bytes": hddbytes,
"Timeout": experimentDetails.ChaosDuration,
"Volume Mount Path": experimentDetails.VolumeMountPath,
})
if experimentDetails.VolumeMountPath == "" {
stressArgs = append(stressArgs, "--io "+experimentDetails.NumberOfWorkers+" --hdd "+experimentDetails.NumberOfWorkers+" --hdd-bytes "+hddbytes)
} else {
stressArgs = append(stressArgs, "--io "+experimentDetails.NumberOfWorkers+" --hdd "+experimentDetails.NumberOfWorkers+" --hdd-bytes "+hddbytes+" --temp-path "+experimentDetails.VolumeMountPath)
}
if experimentDetails.CPUcores != "0" {
stressArgs = append(stressArgs, "--cpu %v", experimentDetails.CPUcores)
}
default:
log.Fatalf("stressor for %v experiment is not supported", experimentDetails.ExperimentName)
}
return stressArgs
}
// pidPath will get the pid path of the container
func pidPath(t *targetDetails, index int) cgroups.Path {
processPath := "/proc/" + strconv.Itoa(t.Pids[index]) + "/cgroup"
paths, err := parseCgroupFile(processPath, t, index)
if err != nil {
return getErrorPath(errors.Wrapf(err, "parse cgroup file %s", processPath))
}
return getExistingPath(paths, t.Pids[index], "")
}
// parseCgroupFile will read and verify the cgroup file entry of a container
func parseCgroupFile(path string, t *targetDetails, index int) (map[string]string, error) {
file, err := os.Open(path)
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to parse cgroup: %s", err.Error())}
}
defer file.Close()
return parseCgroupFromReader(file, t, index)
}
// parseCgroupFromReader will parse the cgroup file from the reader
func parseCgroupFromReader(r io.Reader, t *targetDetails, index int) (map[string]string, error) {
var (
cgroups = make(map[string]string)
s = bufio.NewScanner(r)
)
for s.Scan() {
var (
text = s.Text()
parts = strings.SplitN(text, ":", 3)
)
if len(parts) < 3 {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("invalid cgroup entry: %q", text)}
}
for _, subs := range strings.Split(parts[1], ",") {
if subs != "" {
cgroups[subs] = parts[2]
}
}
}
if err := s.Err(); err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("buffer scanner failed: %s", err.Error())}
}
return cgroups, nil
}
// getExistingPath will be used to get the existing valid cgroup path
func getExistingPath(paths map[string]string, pid int, suffix string) cgroups.Path {
for n, p := range paths {
dest, err := getCgroupDestination(pid, n)
if err != nil {
return getErrorPath(err)
}
rel, err := filepath.Rel(dest, p)
if err != nil {
return getErrorPath(err)
}
if rel == "." {
rel = dest
}
paths[n] = filepath.Join("/", rel)
}
return func(name cgroups.Name) (string, error) {
root, ok := paths[string(name)]
if !ok {
if root, ok = paths[fmt.Sprintf("name=%s", name)]; !ok {
return "", cgroups.ErrControllerNotActive
}
}
if suffix != "" {
return filepath.Join(root, suffix), nil
}
return root, nil
}
}
// getErrorPath will give the invalid cgroup path
func getErrorPath(err error) cgroups.Path {
return func(_ cgroups.Name) (string, error) {
return "", err
}
}
// getCgroupDestination will validate the subsystem with the mountpath in container mountinfo file.
func getCgroupDestination(pid int, subsystem string) (string, error) {
mountinfoPath := fmt.Sprintf("/proc/%d/mountinfo", pid)
file, err := os.Open(mountinfoPath)
if err != nil {
return "", err
}
defer file.Close()
s := bufio.NewScanner(file)
for s.Scan() {
fields := strings.Fields(s.Text())
for _, opt := range strings.Split(fields[len(fields)-1], ",") {
if opt == subsystem {
return fields[3], nil
}
}
}
if err := s.Err(); err != nil {
return "", err
}
return "", errors.Errorf("no destination found for %v ", subsystem)
}
// findValidCgroup will be used to get a valid cgroup path
func findValidCgroup(path cgroups.Path, t *targetDetails, index int) (string, error) {
for _, subsystem := range cgroupSubsystemList {
path, err := path(cgroups.Name(subsystem))
if err != nil {
log.Errorf("fail to retrieve the cgroup path, subsystem: %v, target: %v, err: %v", subsystem, t.ContainerIds[index], err)
continue
}
if strings.Contains(path, t.ContainerIds[index]) {
return path, nil
}
}
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: "could not find valid cgroup"}
}
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
experimentDetails.CPUcores = types.Getenv("CPU_CORES", "")
experimentDetails.CPULoad = types.Getenv("CPU_LOAD", "")
experimentDetails.FilesystemUtilizationPercentage = types.Getenv("FILESYSTEM_UTILIZATION_PERCENTAGE", "")
experimentDetails.FilesystemUtilizationBytes = types.Getenv("FILESYSTEM_UTILIZATION_BYTES", "")
experimentDetails.NumberOfWorkers = types.Getenv("NUMBER_OF_WORKERS", "")
experimentDetails.MemoryConsumption = types.Getenv("MEMORY_CONSUMPTION", "")
experimentDetails.VolumeMountPath = types.Getenv("VOLUME_MOUNT_PATH", "")
experimentDetails.StressType = types.Getenv("STRESS_TYPE", "")
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targets []*targetDetails, resultName, chaosNS string) {
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("[Abort]: Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
for _, t := range targets {
if err = terminateProcess(t); err != nil {
log.Errorf("[Abort]: unable to revert for %v pod, err :%v", t.Name, err)
continue
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("[Abort]: Unable to annotate the chaosresult for %v pod, err :%v", t.Name, err)
}
}
retry--
time.Sleep(1 * time.Second)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}
// getCGroupManager will return the cgroup for the given pid of the process
func getCGroupManager(t *targetDetails, index int) (interface{}, error, string) {
if cgroups.Mode() == cgroups.Unified {
groupPath := ""
output, err := exec.Command("bash", "-c", fmt.Sprintf("nsenter -t 1 -C -m -- cat /proc/%v/cgroup", t.Pids[index])).CombinedOutput()
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to get the cgroup: %s :%v", err.Error(), output)}, ""
}
log.Infof("cgroup output: %s", string(output))
parts := strings.Split(string(output), ":")
if len(parts) < 3 {
return "", fmt.Errorf("invalid cgroup entry: %s", string(output)), ""
}
if strings.HasSuffix(parts[len(parts)-3], "0") && parts[len(parts)-2] == "" {
groupPath = parts[len(parts)-1]
}
log.Infof("group path: %s", groupPath)
cgroup2, err := cgroupsv2.LoadManager("/sys/fs/cgroup", string(groupPath))
if err != nil {
return nil, errors.Errorf("Error loading cgroup v2 manager, %v", err), ""
}
return cgroup2, nil, groupPath
}
path := pidPath(t, index)
cgroup, err := findValidCgroup(path, t, index)
if err != nil {
return nil, stacktrace.Propagate(err, "could not find valid cgroup"), ""
}
cgroup1, err := cgroups.Load(cgroups.V1, cgroups.StaticPath(cgroup))
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to load the cgroup: %s", err.Error())}, ""
}
return cgroup1, nil, ""
}
// addProcessToCgroup will add the process to cgroup
// By default it will add to v1 cgroup
func addProcessToCgroup(pid int, control interface{}, groupPath string) error {
if cgroups.Mode() == cgroups.Unified {
args := []string{"-t", "1", "-C", "--", "sudo", "sh", "-c", fmt.Sprintf("echo %d >> /sys/fs/cgroup%s/cgroup.procs", pid, strings.ReplaceAll(groupPath, "\n", ""))}
output, err := exec.Command("nsenter", args...).CombinedOutput()
if err != nil {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: fmt.Sprintf("failed to add process to cgroup %s: %v", string(output), err),
}
}
return nil
}
var cgroup1 = control.(cgroups.Cgroup)
return cgroup1.Add(cgroups.Process{Pid: pid})
}
func injectChaos(t *targetDetails, stressors string, index int, stressType string) (*Command, error) {
stressCommand := fmt.Sprintf("pause nsutil -t %v -p -- %v", strconv.Itoa(t.Pids[index]), stressors)
// for io stress,we need to enter into mount ns of the target container
// enabling it by passing -m flag
if stressType == "pod-io-stress" {
stressCommand = fmt.Sprintf("pause nsutil -t %v -p -m -- %v", strconv.Itoa(t.Pids[index]), stressors)
}
log.Infof("[Info]: starting process: %v", stressCommand)
// launch the stress-ng process on the target container in paused mode
cmd := exec.Command("/bin/bash", "-c", stressCommand)
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
var buf bytes.Buffer
cmd.Stdout = &buf
cmd.Stderr = &buf
err = cmd.Start()
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("failed to start stress process: %s", err.Error())}
}
// add the stress process to the cgroup of target container
if err = addProcessToCgroup(cmd.Process.Pid, t.CGroupManagers[index], t.GroupPath); err != nil {
if killErr := cmd.Process.Kill(); killErr != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to add the stress process to cgroup %s and kill stress process: %s", err.Error(), killErr.Error())}
}
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to add the stress process to cgroup: %s", err.Error())}
}
log.Info("[Info]: Sending signal to resume the stress process")
// wait for the process to start before sending the resume signal
// TODO: need a dynamic way to check the start of the process
time.Sleep(700 * time.Millisecond)
// remove pause and resume or start the stress process
if err := cmd.Process.Signal(syscall.SIGCONT); err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to remove pause and start the stress process: %s", err.Error())}
}
return &Command{
Cmd: cmd,
Buffer: buf,
}, nil
}
type targetDetails struct {
Name string
Namespace string
TargetContainers []string
ContainerIds []string
Pids []int
CGroupManagers []interface{}
Cmds []*Command
Source string
GroupPath string
}
type Command struct {
Cmd *exec.Cmd
Buffer bytes.Buffer
}

View File

@ -0,0 +1,318 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareAndInjectStressChaos contains the prepration & injection steps for the stress experiments.
func PrepareAndInjectStressChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodStressFault")
defer span.End()
var err error
//Set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
switch experimentsDetails.StressType {
case "pod-cpu-stress":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"CPU Core": experimentsDetails.CPUcores,
"CPU Load Percentage": experimentsDetails.CPULoad,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "pod-memory-stress":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Number of Workers": experimentsDetails.NumberOfWorkers,
"Memory Consumption": experimentsDetails.MemoryConsumption,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "pod-io-stress":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"FilesystemUtilizationBytes": experimentsDetails.FilesystemUtilizationBytes,
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
}
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode inject the stress chaos in all target application serially (one by one)
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodStressFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform the stress chaos
for _, pod := range targetPodList.Items {
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode inject the stress chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodStressFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreatePodStressFaultHelperPod")
defer span.End()
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
Volumes: []apiv1.Volume{
{
Name: "socket-path",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
{
Name: "sys-path",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/sys",
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
},
Args: []string{
"-c",
"./helpers -name stress-chaos",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "socket-path",
MountPath: experimentsDetails.SocketPath,
},
{
Name: "sys-path",
MountPath: "/sys",
},
},
SecurityContext: &apiv1.SecurityContext{
Privileged: &privilegedEnable,
RunAsUser: ptrint64(0),
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
SetEnv("CHAOS_UID", string(experimentsDetails.ChaosUID)).
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("CPU_CORES", experimentsDetails.CPUcores).
SetEnv("CPU_LOAD", experimentsDetails.CPULoad).
SetEnv("FILESYSTEM_UTILIZATION_PERCENTAGE", experimentsDetails.FilesystemUtilizationPercentage).
SetEnv("FILESYSTEM_UTILIZATION_BYTES", experimentsDetails.FilesystemUtilizationBytes).
SetEnv("NUMBER_OF_WORKERS", experimentsDetails.NumberOfWorkers).
SetEnv("MEMORY_CONSUMPTION", experimentsDetails.MemoryConsumption).
SetEnv("VOLUME_MOUNT_PATH", experimentsDetails.VolumeMountPath).
SetEnv("STRESS_TYPE", experimentsDetails.StressType).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
}
func ptrint64(p int64) *int64 {
return &p
}
// SetChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.CPUcores = common.ValidateRange(experimentsDetails.CPUcores)
experimentsDetails.CPULoad = common.ValidateRange(experimentsDetails.CPULoad)
experimentsDetails.MemoryConsumption = common.ValidateRange(experimentsDetails.MemoryConsumption)
experimentsDetails.NumberOfWorkers = common.ValidateRange(experimentsDetails.NumberOfWorkers)
experimentsDetails.FilesystemUtilizationPercentage = common.ValidateRange(experimentsDetails.FilesystemUtilizationPercentage)
experimentsDetails.FilesystemUtilizationBytes = common.ValidateRange(experimentsDetails.FilesystemUtilizationBytes)
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)
}

View File

@ -0,0 +1,264 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/vmware"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/vmware/vm-poweroff/types"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var inject, abort chan os.Signal
// InjectVMPowerOffChaos injects the chaos in serial or parallel mode
func InjectVMPowerOffChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, cookie string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareVMPowerOffFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Fetching the target VM Ids
vmIdList := strings.Split(experimentsDetails.VMIds, ",")
// Calling AbortWatcher go routine, it will continuously watch for the abort signal and generate the required events and result
go abortWatcher(experimentsDetails, vmIdList, clients, resultDetails, chaosDetails, eventsDetails, cookie)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(ctx, experimentsDetails, vmIdList, cookie, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(ctx, experimentsDetails, vmIdList, cookie, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stops VMs in serial mode i.e. one after the other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, cookie string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "injectVMPowerOffFaultInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target VM Id list, %v", vmIdList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i, vmId := range vmIdList {
//Stopping the VM
log.Infof("[Chaos]: Stopping %s VM", vmId)
if err := vmware.StopVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return stacktrace.Propagate(err, fmt.Sprintf("failed to stop %s vm", vmId))
}
common.SetTargets(vmId, "injected", "VM", chaosDetails)
//Wait for the VM to completely stop
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_OFF state", vmId)
if err := vmware.WaitForVMStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return stacktrace.Propagate(err, "VM shutdown failed")
}
//Run the probes during the chaos
//The OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
//Starting the VM
log.Infof("[Chaos]: Starting back %s VM", vmId)
if err := vmware.StartVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return stacktrace.Propagate(err, "failed to start back vm")
}
//Wait for the VM to completely start
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_ON state", vmId)
if err := vmware.WaitForVMStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return stacktrace.Propagate(err, "vm failed to start")
}
common.SetTargets(vmId, "reverted", "VM", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode stops VMs in parallel mode i.e. all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, cookie string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "injectVMPowerOffFaultInParallelMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target VM Id list, %v", vmIdList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos in VM"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for _, vmId := range vmIdList {
//Stopping the VM
log.Infof("[Chaos]: Stopping %s VM", vmId)
if err := vmware.StopVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return stacktrace.Propagate(err, fmt.Sprintf("failed to stop %s vm", vmId))
}
common.SetTargets(vmId, "injected", "VM", chaosDetails)
}
for _, vmId := range vmIdList {
//Wait for the VM to completely stop
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_OFF state", vmId)
if err := vmware.WaitForVMStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return stacktrace.Propagate(err, "vm failed to shutdown")
}
}
//Running the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//Waiting for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
common.WaitForDuration(experimentsDetails.ChaosInterval)
for _, vmId := range vmIdList {
//Starting the VM
log.Infof("[Chaos]: Starting back %s VM", vmId)
if err := vmware.StartVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return stacktrace.Propagate(err, fmt.Sprintf("failed to start back %s vm", vmId))
}
}
for _, vmId := range vmIdList {
//Wait for the VM to completely start
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_ON state", vmId)
if err := vmware.WaitForVMStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return stacktrace.Propagate(err, "vm failed to successfully start")
}
}
for _, vmId := range vmIdList {
common.SetTargets(vmId, "reverted", "VM", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// abortWatcher watches for the abort signal and reverts the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, cookie string) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for _, vmId := range vmIdList {
vmStatus, err := vmware.GetVMStatus(experimentsDetails.VcenterServer, vmId, cookie)
if err != nil {
log.Errorf("failed to get vm status of %s when an abort signal is received: %s", vmId, err.Error())
}
if vmStatus != "POWERED_ON" {
log.Infof("[Abort]: Waiting for the VM %s to shutdown", vmId)
if err := vmware.WaitForVMStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
log.Errorf("vm %s failed to successfully shutdown when an abort signal was received: %s", vmId, err.Error())
}
log.Infof("[Abort]: Starting %s VM as abort signal has been received", vmId)
if err := vmware.StartVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
log.Errorf("vm %s failed to start when an abort signal was received: %s", vmId, err.Error())
}
}
common.SetTargets(vmId, "reverted", "VM", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -1,270 +0,0 @@
package lib
import (
"strconv"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-delete/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/openebs/maya/pkg/util/retry"
"github.com/pkg/errors"
appsv1 "k8s.io/api/apps/v1"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.ChaosServiceAccount == "" {
// Getting the serviceAccountName for the powerfulseal pod
err := GetServiceAccount(experimentsDetails, clients)
if err != nil {
return errors.Errorf("Unable to get the serviceAccountName, err: %v", err)
}
}
// generating a unique string which can be appended with the powerfulseal deployment name & labels for the uniquely identification
runID := common.GetRunID()
// generating the chaos inject event in the chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// Creating configmap for powerfulseal deployment
err := CreateConfigMap(experimentsDetails, clients, runID)
if err != nil {
return err
}
// Creating powerfulseal deployment
err = CreatePowerfulsealDeployment(experimentsDetails, clients, runID)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
//checking the status of the powerfulseal pod, wait till the powerfulseal pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "name=powerfulseal-"+runID, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("powerfulseal pod is not in running state, err: %v", err)
}
// Wait for Chaos Duration
log.Infof("[Wait]: Waiting for the %vs chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
//Deleting the powerfulseal deployment
log.Info("[Cleanup]: Deleting the powerfulseal deployment")
err = DeletePowerfulsealDeployment(experimentsDetails, clients, runID)
if err != nil {
return errors.Errorf("Unable to delete the powerfulseal deployment, err: %v", err)
}
//Deleting the powerfulseal configmap
log.Info("[Cleanup]: Deleting the powerfulseal configmap")
err = DeletePowerfulsealConfigmap(experimentsDetails, clients, runID)
if err != nil {
return errors.Errorf("Unable to delete the powerfulseal configmap, err: %v", err)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// GetServiceAccount find the serviceAccountName for the powerfulseal deployment
func GetServiceAccount(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Get(experimentsDetails.ChaosPodName, v1.GetOptions{})
if err != nil {
return err
}
experimentsDetails.ChaosServiceAccount = pod.Spec.ServiceAccountName
return nil
}
// CreateConfigMap creates a configmap for the powerfulseal deployment
func CreateConfigMap(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
data := make(map[string]string, 0)
// It will store all the details inside a string in well formated way
policy := GetConfigMapData(experimentsDetails)
data["policy"] = policy
configMap := &apiv1.ConfigMap{
ObjectMeta: v1.ObjectMeta{
Name: "policy-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"name": "policy-" + runID,
},
},
Data: data,
}
_, err := clients.KubeClient.CoreV1().ConfigMaps(experimentsDetails.ChaosNamespace).Create(configMap)
return err
}
// GetConfigMapData generates the configmap data for the powerfulseal deployments in desired format format
func GetConfigMapData(experimentsDetails *experimentTypes.ExperimentDetails) string {
policy := "config:" + "\n" +
" minSecondsBetweenRuns: 1" + "\n" +
" maxSecondsBetweenRuns: " + strconv.Itoa(experimentsDetails.ChaosInterval) + "\n" +
"podScenarios:" + "\n" +
" - name: \"delete random pods in application namespace\"" + "\n" +
" match:" + "\n" +
" - labels:" + "\n" +
" namespace: " + experimentsDetails.AppNS + "\n" +
" selector: " + experimentsDetails.AppLabel + "\n" +
" filters:" + "\n" +
" - randomSample:" + "\n" +
" size: 1" + "\n" +
" actions:" + "\n" +
" - kill:" + "\n" +
" probability: 0.77" + "\n" +
" force: " + strconv.FormatBool(experimentsDetails.Force)
return policy
}
// CreatePowerfulsealDeployment derive the attributes for powerfulseal deployment and create it
func CreatePowerfulsealDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
deployment := &appsv1.Deployment{
ObjectMeta: v1.ObjectMeta{
Name: "powerfulseal-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": "powerfulseal",
"name": "powerfulseal-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
},
},
Spec: appsv1.DeploymentSpec{
Selector: &v1.LabelSelector{
MatchLabels: map[string]string{
"name": "powerfulseal-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
},
Replicas: func(i int32) *int32 { return &i }(1),
Template: apiv1.PodTemplateSpec{
ObjectMeta: v1.ObjectMeta{
Labels: map[string]string{
"name": "powerfulseal-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
},
Spec: apiv1.PodSpec{
Volumes: []apiv1.Volume{
{
Name: "policyfile",
VolumeSource: apiv1.VolumeSource{
ConfigMap: &apiv1.ConfigMapVolumeSource{
LocalObjectReference: apiv1.LocalObjectReference{
Name: "policy-" + runID,
},
},
},
},
},
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
TerminationGracePeriodSeconds: func(i int64) *int64 { return &i }(0),
Containers: []apiv1.Container{
{
Name: "powerfulseal",
Image: "ksatchit/miko-powerfulseal:non-ssh",
Args: []string{
"autonomous",
"--inventory-kubernetes",
"--no-cloud",
"--policy-file=/root/policy_kill_random_default.yml",
"--use-pod-delete-instead-of-ssh-kill",
},
VolumeMounts: []apiv1.VolumeMount{
{
Name: "policyfile",
MountPath: "/root/policy_kill_random_default.yml",
SubPath: "policy",
},
},
},
},
},
},
},
}
_, err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaosNamespace).Create(deployment)
return err
}
//DeletePowerfulsealDeployment delete the powerfulseal deployment
func DeletePowerfulsealDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaosNamespace).Delete("powerfulseal-"+runID, &v1.DeleteOptions{})
if err != nil {
return err
}
err = retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
podSpec, err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaosNamespace).List(v1.ListOptions{LabelSelector: "name=powerfulseal-" + runID})
if err != nil || len(podSpec.Items) != 0 {
return errors.Errorf("Deployment is not deleted yet, err: %v", err)
}
return nil
})
return err
}
//DeletePowerfulsealConfigmap delete the powerfulseal configmap
func DeletePowerfulsealConfigmap(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
err := clients.KubeClient.CoreV1().ConfigMaps(experimentsDetails.ChaosNamespace).Delete("policy-"+runID, &v1.DeleteOptions{})
if err != nil {
return err
}
err = retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
podSpec, err := clients.KubeClient.CoreV1().ConfigMaps(experimentsDetails.ChaosNamespace).List(v1.ListOptions{LabelSelector: "name=policy-" + runID})
if err != nil || len(podSpec.Items) != 0 {
return errors.Errorf("configmap is not deleted yet, err: %v", err)
}
return nil
})
return err
}

View File

@ -1,359 +0,0 @@
package lib
import (
"strconv"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareContainerKill contains the prepration steps before chaos injection
func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = GetTargetContainer(experimentsDetails, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("Unable to get the target container name, err: %v", err)
}
}
if experimentsDetails.EngineName != "" {
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// Get Resource Requirements
experimentsDetails.Resources, err = common.GetChaosPodResourceRequirements(experimentsDetails.ChaosPodName, experimentsDetails.ExperimentName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("Unable to get resource requirements, err: %v", err)
}
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails); err != nil {
return err
}
} else {
if err = InjectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails); err != nil {
return err
}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// InjectChaosInSerialMode kill the container of all target application serially (one by one)
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
labelSuffix := common.GetRunID()
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
//GetRestartCount return the restart count of target container
restartCountBefore := GetRestartCount(pod, experimentsDetails.TargetContainer)
log.Infof("restartCount of target container before chaos injection: %v", restartCountBefore)
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"Target Container": experimentsDetails.TargetContainer,
})
err := CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID, labelSuffix)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
log.Infof("[Wait]: Waiting for the %vs chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
// It will verify that the restart count of container should increase after chaos injection
err = VerifyRestartCount(experimentsDetails, pod, clients, restartCountBefore)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("Target container is not restarted, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// InjectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
//GetRestartCount return the restart count of target container
restartCountBefore := GetRestartCountAll(targetPodList, experimentsDetails.TargetContainer)
log.Infof("restartCount of target containers before chaos injection: %v", restartCountBefore)
labelSuffix := common.GetRunID()
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"Target Container": experimentsDetails.TargetContainer,
})
err := CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID, labelSuffix)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err := status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
log.Infof("[Wait]: Waiting for the %vs chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
// It will verify that the restart count of container should increase after chaos injection
err = VerifyRestartCountAll(experimentsDetails, targetPodList, clients, restartCountBefore)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("Target container is not restarted , err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
return nil
}
//GetTargetContainer will fetch the container name from application pod
//This container will be used as target container
func GetTargetContainer(experimentsDetails *experimentTypes.ExperimentDetails, appName string, clients clients.ClientSets) (string, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(appName, v1.GetOptions{})
if err != nil {
return "", err
}
return pod.Spec.Containers[0].Name, nil
}
//GetRestartCount return the restart count of target container
func GetRestartCount(targetPod apiv1.Pod, containerName string) int {
restartCount := 0
for _, container := range targetPod.Status.ContainerStatuses {
if container.Name == containerName {
restartCount = int(container.RestartCount)
break
}
}
return restartCount
}
//GetRestartCountAll return the restart count of all target container
func GetRestartCountAll(targetPodList apiv1.PodList, containerName string) []int {
restartCount := []int{}
for _, pod := range targetPodList.Items {
restartCount = append(restartCount, GetRestartCount(pod, containerName))
}
return restartCount
}
//VerifyRestartCount verify the restart count of target container that it is restarted or not after chaos injection
// the restart count of container should increase after chaos injection
func VerifyRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, pod apiv1.Pod, clients clients.ClientSets, restartCountBefore int) error {
restartCountAfter := 0
err := retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(pod.Name, v1.GetOptions{})
if err != nil {
return err
}
for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentsDetails.TargetContainer {
restartCountAfter = int(container.RestartCount)
break
}
}
return nil
})
if err != nil {
return err
}
// it will fail if restart count won't increase
if restartCountAfter <= restartCountBefore {
return errors.Errorf("Target container is not restarted")
}
log.Infof("restartCount of target container after chaos injection: %v", restartCountAfter)
return nil
}
//VerifyRestartCountAll verify the restart count of all the target container that it is restarted or not after chaos injection
// the restart count of container should increase after chaos injection
func VerifyRestartCountAll(experimentsDetails *experimentTypes.ExperimentDetails, podList apiv1.PodList, clients clients.ClientSets, restartCountBefore []int) error {
for index, pod := range podList.Items {
if err := VerifyRestartCount(experimentsDetails, pod, clients, restartCountBefore[index]); err != nil {
return err
}
}
return nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appName, appNodeName, runID, labelSuffix string) error {
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper-" + labelSuffix,
"name": experimentsDetails.ExperimentName + "-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
InitContainers: []apiv1.Container{
{
Name: "setup-" + experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
"-c",
"sudo chmod 777 " + experimentsDetails.SocketPath,
},
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"pumba",
},
Args: []string{
"--random",
"--interval",
strconv.Itoa(experimentsDetails.ChaosInterval) + "s",
"kill",
"--signal",
experimentsDetails.Signal,
"re2:k8s_" + experimentsDetails.TargetContainer + "_" + appName,
},
Resources: experimentsDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}

View File

@ -1,274 +0,0 @@
package lib
import (
"strconv"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-cpu-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var err error
// PreparePodCPUHog contains prepration steps before chaos injection
func PreparePodCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.EngineName != "" {
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// Get Resource Requirements
experimentsDetails.Resources, err = common.GetChaosPodResourceRequirements(experimentsDetails.ChaosPodName, experimentsDetails.ExperimentName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("Unable to get resource requirements, err: %v", err)
}
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails); err != nil {
return err
}
} else {
if err = InjectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails); err != nil {
return err
}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// InjectChaosInSerialMode stress the cpu of all target application serially (one by one)
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
labelSuffix := common.GetRunID()
// creating the helper pod to perform cpu chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"CPUcores": experimentsDetails.CPUcores,
})
err = CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID, labelSuffix)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", strconv.Itoa(experimentsDetails.ChaosDuration+30))
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+30, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// InjectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
labelSuffix := common.GetRunID()
// creating the helper pod to perform cpu chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"CPUcores": experimentsDetails.CPUcores,
})
if err = CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", strconv.Itoa(experimentsDetails.ChaosDuration+30))
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+30, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
return nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appName, appNodeName, runID, labelSuffix string) error {
helperPod := &apiv1.Pod{
TypeMeta: v1.TypeMeta{
Kind: "Pod",
APIVersion: "v1",
},
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper-" + labelSuffix,
"name": experimentsDetails.ExperimentName + "-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
// prevent pumba from killing itself
"com.gaiaadm.pumba": "true",
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
InitContainers: []apiv1.Container{
{
Name: "setup-pumba-stress",
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
"-c",
"sudo chmod 777 " + experimentsDetails.SocketPath,
},
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: "pumba-stress",
Image: experimentsDetails.LIBImage,
Command: []string{
"pumba",
},
Args: GetContainerArguments(experimentsDetails, appName),
Resources: experimentsDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
SecurityContext: &apiv1.SecurityContext{
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}
// GetContainerArguments derives the args for the pumba stress helper pod
func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails, appName string) []string {
stressArgs := []string{
"--log-level",
"debug",
"--label",
"io.kubernetes.pod.name=" + appName,
"stress",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--stressors",
"--cpu " + strconv.Itoa(experimentsDetails.CPUcores) + " --timeout " + strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
return stressArgs
}

View File

@ -1,275 +0,0 @@
package lib
import (
"strconv"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-memory-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var err error
// PreparePodMemoryHog contains prepration steps before chaos injection
func PreparePodMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.EngineName != "" {
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// Get Resource Requirements
experimentsDetails.Resources, err = common.GetChaosPodResourceRequirements(experimentsDetails.ChaosPodName, experimentsDetails.ExperimentName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("Unable to get resource requirements, err: %v", err)
}
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails); err != nil {
return err
}
} else {
if err = InjectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails); err != nil {
return err
}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// InjectChaosInSerialMode stress the cpu of all target application serially (one by one)
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
labelSuffix := common.GetRunID()
// creating the helper pod to perform memory chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"MemoryBytes": experimentsDetails.MemoryConsumption,
})
err = CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID, labelSuffix)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", strconv.Itoa(experimentsDetails.ChaosDuration+30))
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+30, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// InjectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
labelSuffix := common.GetRunID()
// creating the helper pod to perform memory chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"MemoryBytes": experimentsDetails.MemoryConsumption,
})
err = CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID, labelSuffix)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", strconv.Itoa(experimentsDetails.ChaosDuration+30))
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+30, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
return nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appName, appNodeName, runID, labelSuffix string) error {
helperPod := &apiv1.Pod{
TypeMeta: v1.TypeMeta{
Kind: "Pod",
APIVersion: "v1",
},
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper-" + labelSuffix,
"name": experimentsDetails.ExperimentName + "-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
// prevent pumba from killing itself
"com.gaiaadm.pumba": "true",
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
InitContainers: []apiv1.Container{
{
Name: "setup-pumba-stress",
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
"-c",
"sudo chmod 777 " + experimentsDetails.SocketPath,
},
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: "pumba-stress",
Image: experimentsDetails.LIBImage,
Command: []string{
"pumba",
},
Args: GetContainerArguments(experimentsDetails, appName),
Resources: experimentsDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
SecurityContext: &apiv1.SecurityContext{
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}
// GetContainerArguments derives the args for the pumba stress helper pod
func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails, appName string) []string {
stressArgs := []string{
"--log-level",
"debug",
"--label",
"io.kubernetes.pod.name=" + appName,
"stress",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--stressors",
"--cpu 1 --vm 1 --vm-bytes " + strconv.Itoa(experimentsDetails.MemoryConsumption) + "M --timeout " + strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
return stressArgs
}

View File

@ -1,44 +0,0 @@
package corruption
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
var err error
//PodNetworkCorruptionChaos contains the steps to prepare and inject chaos
func PodNetworkCorruptionChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args := GetContainerArguments(experimentsDetails)
err = network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
if err != nil {
return err
}
return nil
}
// GetContainerArguments derives the args for the pumba pod
func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) []string {
baseArgs := []string{
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args = network_chaos.AddTargetIpsArgs(experimentsDetails.DestinationIPs, args)
args = network_chaos.AddTargetIpsArgs(network_chaos.GetIpsForTargetHosts(experimentsDetails.DestinationHosts), args)
args = append(args, "corrupt", "--percent", strconv.Itoa(experimentsDetails.NetworkPacketCorruptionPercentage))
return args
}

View File

@ -1,44 +0,0 @@
package duplication
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
var err error
//PodNetworkDuplicationChaos contains the steps to prepare and inject chaos
func PodNetworkDuplicationChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args := GetContainerArguments(experimentsDetails)
err = network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
if err != nil {
return err
}
return nil
}
// GetContainerArguments derives the args for the pumba pod
func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) []string {
baseArgs := []string{
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args = network_chaos.AddTargetIpsArgs(experimentsDetails.DestinationIPs, args)
args = network_chaos.AddTargetIpsArgs(network_chaos.GetIpsForTargetHosts(experimentsDetails.DestinationHosts), args)
args = append(args, "duplicate", "--percent", strconv.Itoa(experimentsDetails.NetworkPacketDuplicationPercentage))
return args
}

View File

@ -1,44 +0,0 @@
package latency
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
var err error
//PodNetworkLatencyChaos contains the steps to prepare and inject chaos
func PodNetworkLatencyChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args := GetContainerArguments(experimentsDetails)
err = network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
if err != nil {
return err
}
return nil
}
// GetContainerArguments derives the args for the pumba pod
func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) []string {
baseArgs := []string{
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args = network_chaos.AddTargetIpsArgs(experimentsDetails.DestinationIPs, args)
args = network_chaos.AddTargetIpsArgs(network_chaos.GetIpsForTargetHosts(experimentsDetails.DestinationHosts), args)
args = append(args, "delay", "--time", strconv.Itoa(experimentsDetails.NetworkLatency))
return args
}

View File

@ -1,44 +0,0 @@
package loss
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
var err error
//PodNetworkLossChaos contains the steps to prepare and inject chaos
func PodNetworkLossChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args := GetContainerArguments(experimentsDetails)
err = network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
if err != nil {
return err
}
return nil
}
// GetContainerArguments derives the args for the pumba pod
func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) []string {
baseArgs := []string{
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args = network_chaos.AddTargetIpsArgs(experimentsDetails.DestinationIPs, args)
args = network_chaos.AddTargetIpsArgs(network_chaos.GetIpsForTargetHosts(experimentsDetails.DestinationHosts), args)
args = append(args, "loss", "--percent", strconv.Itoa(experimentsDetails.NetworkPacketLossPercentage))
return args
}

View File

@ -1,304 +0,0 @@
package lib
import (
"net"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var err error
//PrepareAndInjectChaos contains the prepration and chaos injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args []string) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// Get Resource Requirements
experimentsDetails.Resources, err = common.GetChaosPodResourceRequirements(experimentsDetails.ChaosPodName, experimentsDetails.ExperimentName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("Unable to get resource requirements, err: %v", err)
}
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return err
}
} else {
if err = InjectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return err
}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// InjectChaosInSerialMode stress the cpu of all target application serially (one by one)
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args []string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
})
// args contains details of the specific chaos injection
// constructing `argsWithRegex` based on updated regex with a diff pod name
// without extending/concatenating the args var itself
argsWithRegex := append(args, "re2:k8s_POD_"+pod.Name+"_"+experimentsDetails.AppNS)
log.Infof("Arguments for running %v are %v", experimentsDetails.ExperimentName, argsWithRegex)
err = CreateHelperPod(experimentsDetails, clients, pod.Spec.NodeName, runID, argsWithRegex, labelSuffix)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return err
}
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", experimentsDetails.ChaosDuration)
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+30, chaosDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// InjectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args []string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
})
// args contains details of the specific chaos injection
// constructing `argsWithRegex` based on updated regex with a diff pod name
// without extending/concatenating the args var itself
argsWithRegex := append(args, "re2:k8s_POD_"+pod.Name+"_"+experimentsDetails.AppNS)
log.Infof("Arguments for running %v are %v", experimentsDetails.ExperimentName, argsWithRegex)
err = CreateHelperPod(experimentsDetails, clients, pod.Spec.NodeName, runID, argsWithRegex, labelSuffix)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return err
}
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", experimentsDetails.ChaosDuration)
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+30, chaosDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
return nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appNodeName, runID string, args []string, labelSuffix string) error {
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper-" + labelSuffix,
"name": experimentsDetails.ExperimentName + "-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
InitContainers: []apiv1.Container{
{
Name: "setup-" + experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
"-c",
"sudo chmod 777 " + experimentsDetails.SocketPath,
},
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"pumba",
},
Args: args,
Resources: experimentsDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}
// AddTargetIpsArgs inserts a comma-separated list of targetIPs (if provided by the user) into the pumba command/args
func AddTargetIpsArgs(targetIPs string, args []string) []string {
if targetIPs == "" {
return args
}
ips := strings.Split(targetIPs, ",")
for i := range ips {
args = append(args, "--target", strings.TrimSpace(ips[i]))
}
return args
}
// GetIpsForTargetHosts resolves IP addresses for comma-separated list of target hosts and returns comma-separated ips
func GetIpsForTargetHosts(targetHosts string) string {
if targetHosts == "" {
return ""
}
hosts := strings.Split(targetHosts, ",")
var commaSeparatedIPs []string
for i := range hosts {
ips, err := net.LookupIP(hosts[i])
if err != nil {
log.Infof("Unknown host")
} else {
for j := range ips {
log.Infof("IP address: %v", ips[j])
commaSeparatedIPs = append(commaSeparatedIPs, ips[j].String())
}
}
}
return strings.Join(commaSeparatedIPs, ",")
}

View File

@ -1,299 +0,0 @@
package lib
import (
"strconv"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-io-stress/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var err error
// PreparePodIOStress contains prepration steps before chaos injection
func PreparePodIOStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.EngineName != "" {
// Get Chaos Pod Annotation
experimentsDetails.Annotations, err = common.GetChaosPodAnnotation(experimentsDetails.ChaosPodName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("unable to get annotations, err: %v", err)
}
// Get Resource Requirements
experimentsDetails.Resources, err = common.GetChaosPodResourceRequirements(experimentsDetails.ChaosPodName, experimentsDetails.ExperimentName, experimentsDetails.ChaosNamespace, clients)
if err != nil {
return errors.Errorf("Unable to get resource requirements, err: %v", err)
}
}
if experimentsDetails.Sequence == "serial" {
if err = InjectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails); err != nil {
return err
}
} else {
if err = InjectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails); err != nil {
return err
}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// InjectChaosInSerialMode stress the cpu of all target application serially (one by one)
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
labelSuffix := common.GetRunID()
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"FilesystemUtilizationBytes": experimentsDetails.FilesystemUtilizationBytes,
})
err = CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID, labelSuffix)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", strconv.Itoa(experimentsDetails.ChaosDuration+30))
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+30, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// InjectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
labelSuffix := common.GetRunID()
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"FilesystemUtilizationBytes": experimentsDetails.FilesystemUtilizationBytes,
})
err = CreateHelperPod(experimentsDetails, clients, pod.Name, pod.Spec.NodeName, runID, labelSuffix)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of helper pod
log.Infof("[Wait]: Waiting for %vs till the completion of the helper pod", strconv.Itoa(experimentsDetails.ChaosDuration+30))
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+30, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod failed due to, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("Unable to delete the helper pod, err: %v", err)
}
return nil
}
// CreateHelperPod derive the attributes for helper pod and create the helper pod
func CreateHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appName, appNodeName, runID, labelSuffix string) error {
helperPod := &apiv1.Pod{
TypeMeta: v1.TypeMeta{
Kind: "Pod",
APIVersion: "v1",
},
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": experimentsDetails.ExperimentName + "-helper-" + labelSuffix,
"name": experimentsDetails.ExperimentName + "-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
// prevent pumba from killing itself
"com.gaiaadm.pumba": "true",
},
Annotations: experimentsDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
InitContainers: []apiv1.Container{
{
Name: "setup-pumba-stress",
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"/bin/bash",
"-c",
"sudo chmod 777 " + experimentsDetails.SocketPath,
},
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: "pumba-stress",
Image: experimentsDetails.LIBImage,
Command: []string{
"pumba",
},
Args: GetContainerArguments(experimentsDetails, appName),
Resources: experimentsDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
SecurityContext: &apiv1.SecurityContext{
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(helperPod)
return err
}
// GetContainerArguments derives the args for the pumba stress helper pod
func GetContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails, appName string) []string {
var hddbytes string
if experimentsDetails.FilesystemUtilizationBytes == 0 {
if experimentsDetails.FilesystemUtilizationPercentage == 0 {
hddbytes = "10%"
log.Info("Neither of FilesystemUtilizationPercentage or FilesystemUtilizationBytes provided, proceeding with a default FilesystemUtilizationPercentage value of 10%")
} else {
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationPercentage) + "%"
}
} else {
if experimentsDetails.FilesystemUtilizationPercentage == 0 {
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationBytes) + "G"
} else {
hddbytes = strconv.Itoa(experimentsDetails.FilesystemUtilizationPercentage) + "%"
log.Warn("Both FsUtilPercentage & FsUtilBytes provided as inputs, using the FsUtilPercentage value to proceed with stress exp")
}
}
stressArgs := []string{
"--log-level",
"debug",
"--label",
"io.kubernetes.pod.name=" + appName,
"stress",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--stressors",
}
args := stressArgs
if experimentsDetails.VolumeMountPath == "" {
args = append(args, "--cpu 1 --io "+strconv.Itoa(experimentsDetails.NumberOfWorkers)+" --hdd "+strconv.Itoa(experimentsDetails.NumberOfWorkers)+" --hdd-bytes "+hddbytes+" --timeout "+strconv.Itoa(experimentsDetails.ChaosDuration)+"s")
} else {
args = append(args, "--cpu 1 --io "+strconv.Itoa(experimentsDetails.NumberOfWorkers)+" --hdd "+strconv.Itoa(experimentsDetails.NumberOfWorkers)+" --hdd-bytes "+hddbytes+" --temp-path "+experimentsDetails.VolumeMountPath+" --timeout "+strconv.Itoa(experimentsDetails.ChaosDuration)+"s")
}
return args
}

Some files were not shown because too many files have changed in this diff Show More