Compare commits

..

112 Commits

Author SHA1 Message Date
Neelanjan Manna e7b4e7dbe4
chore: adds retries with timeout for litmus and k8s client operations (#766)
* chore: adds retries for k8s api operations

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

* chore: adds retries for litmus api operations

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

---------

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2025-08-14 15:41:34 +05:30
Neelanjan Manna 62a4986c78
chore: adds common functions for helper pod lifecycle management (#764)
Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2025-08-14 12:18:29 +05:30
Neelanjan Manna d626cf3ec4
Merge pull request #761 from litmuschaos/CHAOS-9404
feat: adds port filtering for ip/hostnames for network faults, adds pod-network-rate-limit fault
2025-08-13 16:40:51 +05:30
neelanjan00 59125424c3
feat: adds ip+port filtering, adds pod-network-rate-limit fault
Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2025-08-13 16:13:24 +05:30
Neelanjan Manna 2e7ff836fc
feat: Adds multi container support for pod stress faults (#757)
* chore: Fix typo in log statement

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

* chore: adds multi-container stress chaos system with improved lifecycle management and better error handling

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

---------

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-08-13 16:04:20 +05:30
Prexy e61d5b33be
written test for `workload.go` in `pkg/workloads` (#767)
* written test for workload.go in pkg/workloads

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* checking go formatting

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
2025-08-12 17:30:22 +05:30
Prexy 14fe30c956
test: add unit tests for exec.go file in pkg/utils folder (#755)
* test: add unit tests for exec.go file in pkg/utils folder

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* fixing gofmt

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* creating table driven test and also updates TestCheckPodStatus

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-07-24 15:33:25 +05:30
Prexy 4ae08899e0
test: add unit tests for retry.go in pkg/utils folder (#754)
* test: add unit tests for retry.go in pkg/utils folder

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* fixing gofmt

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
2025-07-24 11:55:42 +05:30
Prexy 2c38220cca
test: add unit tests for RandStringBytesMask and GetRunID in stringutils (#753)
* test: add unit tests for RandStringBytesMask and GetRunID in stringutils

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* fixing gofmt

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
2025-07-24 11:55:26 +05:30
Sami S. 07de11eeee
Fix: handle pagination in ssm describeInstanceInformation & API Rate Limit (#738)
* Fix: handle pagination in ssm describe

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* implement exponential backoff with jitter for API rate limiting

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Refactor

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Update pkg/cloud/aws/ssm/ssm-operations.go

Co-authored-by: Neelanjan Manna <neelanjanmanna@gmail.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fixup

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Update pkg/cloud/aws/ssm/ssm-operations.go

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Fix: include error message from stderr if container-kill fails (#740) (#741)

Signed-off-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fix(logs): Fix the error logs for container-kill fault (#745)

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fix(container-kill): Fixed the container stop command timeout issue (#747)

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* feat: Add a rds-instance-stop chaos fault (#710)

* feat: Add a rds-instance-stop chaos fault

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>

---------

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Update pkg/cloud/aws/ssm/ssm-operations.go

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fix go fmt ./...

Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Filter instances on api call

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fixes lint

Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>

---------

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>
Signed-off-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>
Co-authored-by: Neelanjan Manna <neelanjanmanna@gmail.com>
Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
Co-authored-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Co-authored-by: Jongwoo Han <jongwooo.han@gmail.com>
Co-authored-by: Udit Gaurav <udit.gaurav@harness.io>
2025-04-30 10:25:10 +05:30
Jongwoo Han 5c22472290
feat: Add a rds-instance-stop chaos fault (#710)
* feat: Add a rds-instance-stop chaos fault

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>

---------

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
2025-04-24 12:54:05 +05:30
Shubham Chaudhary e7b3fb6f9f
fix(container-kill): Fixed the container stop command timeout issue (#747)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-04-15 18:20:23 +05:30
Shubham Chaudhary e1eaea9110
fix(logs): Fix the error logs for container-kill fault (#745)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-04-03 15:35:00 +05:30
Björn Kylberg 491dc5e31a
Fix: include error message from stderr if container-kill fails (#740) (#741)
Signed-off-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
2025-04-03 14:44:05 +05:30
Shubham Chaudhary caae228e35
(chore): fix the go fmt of the files (#734)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-01-17 12:08:34 +05:30
kbfu 34a62d87f3
fix the cgroup 2 problem (#677)
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-01-17 11:29:30 +05:30
Suhyen Im 8246ff891b
feat: propagate trace context to helper pods (#722)
Signed-off-by: Suhyen Im <suhyenim.kor@gmail.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Co-authored-by: Saranya Jena <saranya.jena@harness.io>
2025-01-15 16:34:19 +05:30
Namkyu Park 9b29558585
feat: export k6 results output to the OTEL collector (#726)
* Export k6 results to the otel collector

Signed-off-by: namkyu1999 <lak9348@gmail.com>

* add envs for multiple projects

Signed-off-by: namkyu1999 <lak9348@gmail.com>

---------

Signed-off-by: namkyu1999 <lak9348@gmail.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Co-authored-by: Saranya Jena <saranya.jena@harness.io>
2025-01-15 16:33:43 +05:30
Sayan Mondal c7ab5a3d7c
Merge pull request #732 from heysujal/add-openssh-clients
add openssh-clients to dockerfile
2025-01-15 11:28:17 +05:30
Shubham Chaudhary 3bef3ad67e
Merge branch 'master' into add-openssh-clients 2025-01-15 10:57:02 +05:30
Sujal Gupta b2f68a6ad1
use revertErr instead of err (#730)
Signed-off-by: Sujal Gupta <sujalgupta6100@gmail.com>
2025-01-15 10:38:32 +05:30
Sujal Gupta cd2ec26083 add openssh-clients to dockerfile
Signed-off-by: Sujal Gupta <sujalgupta6100@gmail.com>
2025-01-06 01:04:25 +05:30
Shubham Chaudhary 7e08c69750
chore(stress): Fix the stress faults (#723)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-11-20 15:18:59 +05:30
Namkyu Park 3ef23b01f9
feat: implement opentelemetry for distributed tracing (#706)
* feat: add otel & tracing for distributed tracing

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* feat: add tracing codes to chaslib

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: misc

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: make otel optional

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: skip if litmus-go not received trace_parent

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: Set context.Context as a parameter in each function

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* update templates

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* feat: rename spans and enhance coverage

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: avoid shadowing

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: add logs

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: add logs

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: fix templates

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

---------

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>
2024-10-24 16:14:57 +05:30
Shubham Chaudhary 0cd6c6fae3
(chore): Fix the build, push, and release pipelines (#716)
* (chore): Fix the build, push, and release pipelines

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* (chore): Fix the dockerfile

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-10-15 23:33:54 +05:30
Shubham Chaudhary 6a386d1410
(chore): Fix the disk-fill fault (#715)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-10-15 22:15:14 +05:30
Vedant Shrotria fc646d678c
Merge pull request #707 from dusdjhyeon/ubi-migration
UBI migration of Images - go-runner
2024-08-23 11:32:44 +05:30
dusdjhyeon 6257c1abb8
feat: add build arg
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-22 16:13:18 +09:00
dusdjhyeon 755a562efe
Merge branch 'ubi-migration' of https://github.com/dusdjhyeon/litmus-go into ubi-migration 2024-08-22 16:10:37 +09:00
dusdjhyeon d0814df9ea
fix: set build args
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-22 16:09:40 +09:00
Vedant Shrotria a6012039fd
Update .github/workflows/run-e2e-on-pr-commits.yml 2024-08-22 11:19:42 +05:30
Vedant Shrotria a1f602ba98
Update .github/workflows/run-e2e-on-pr-commits.yml 2024-08-22 11:19:33 +05:30
Vedant Shrotria 7476994a36
Update .github/workflows/run-e2e-on-pr-commits.yml 2024-08-22 11:19:25 +05:30
Vedant Shrotria 3440fb84eb
Update .github/workflows/release.yml 2024-08-22 11:18:46 +05:30
Vedant Shrotria 652e6b8465
Update .github/workflows/release.yml 2024-08-22 11:18:39 +05:30
Vedant Shrotria 996f3b3f5f
Update .github/workflows/push.yml 2024-08-22 11:18:10 +05:30
Vedant Shrotria e73f3bfb21
Update .github/workflows/push.yml 2024-08-22 11:17:54 +05:30
Vedant Shrotria 054d091dce
Update .github/workflows/build.yml 2024-08-22 11:17:37 +05:30
Vedant Shrotria c362119e05
Update .github/workflows/build.yml 2024-08-22 11:17:15 +05:30
dusdjhyeon 31bf293140
fix: change go version and others
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-22 14:39:17 +09:00
Vedant Shrotria 9569c8b2f4
Merge branch 'master' into ubi-migration 2024-08-21 16:25:14 +05:30
dusdjhyeon 4f9f4e0540
fix: upgrade version for vulnerability
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:58 +09:00
dusdjhyeon 399ccd68a0
fix: change kubectl crictl latest version
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:58 +09:00
Jongwoo Han 35958eae38
Rename env to EC2_INSTANCE_TAG (#708)
Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon 003a3dc02c
fix: change docker repo
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon d4eed32a6d
fix: change version arg
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon af7322bece
fix: app_dir and yum
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon bd853f6e25
feat: migration base image
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon cfdb205ca3
fix: typos and add arg
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
Jongwoo Han f051d5ac7c
Rename env to EC2_INSTANCE_TAG (#708)
Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
2024-08-14 16:42:35 +05:30
Andrii Kotelnikov 10e9b774a8
Update workloads.go (#705)
Fix issue with empty kind field
Signed-off-by: Andrii Kotelnikov <andrusha@ukr.net>
2024-06-14 14:16:47 +05:30
Vedant Shrotria 9689f74fce
Merge pull request #701 from Jonsy13/add-gitleaks
Adding `gitleaks` as PR Check
2024-05-20 10:27:09 +05:30
Vedant Shrotria d273ba628b
Merge branch 'master' into add-gitleaks 2024-05-17 17:37:15 +05:30
Jonsy13 2315eaf2a4
Added gitleaks
Signed-off-by: Jonsy13 <vedant.shrotria@harness.io>
2024-05-17 17:34:36 +05:30
Shubham Chaudhary f2b2c2747a
chore(io-stress): Fix the pod-io-stress experiment (#700)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-05-17 16:43:19 +05:30
Udit Gaurav 66d01011bb
Fix pipeline issues (#694)
Fix pipeline issues

---------

Signed-off-by: Udit Gaurav <udit.gaurav@harness.io
2024-04-26 14:17:01 +05:30
Udit Gaurav a440615a51
Fix gofmt issues (#695) 2024-04-25 23:45:59 +05:30
Shubham Chaudhary 78eec36b79
chore(probe): Fix the probe description on failure (#692)
* chore(probe): Fix the probe description on failure

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* chore(probe): Consider http timeout as probe failure

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-04-23 18:06:48 +05:30
Michael Morris b5a24b4044
enable ALL for TARGET_CONTAINER (#683)
Signed-off-by: MichaelMorris <michael.morris@est.tech>
2024-03-14 19:44:18 +05:30
Shubham Chaudhary 6d26c21506
test: Adding fuzz testing for common util (#691)
* test: Adding fuzz testing for common util

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* fix the random interval test

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-03-12 17:02:01 +05:30
Namkyu Park 5554a29ea2
chore: fix typos (#690)
Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>
2024-03-11 20:26:50 +05:30
Sayan Mondal 5f0d882912
test: Adding fuzz testing for common util (#688) 2024-03-08 21:42:20 +05:30
Namkyu Park eef3b4021d
feat: Add a k6-loadgen chaos fault (#687)
* feat: add k6-loadgen

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>
2024-03-07 19:19:51 +05:30
smit thakkar 96f6571e77
fix: accomodate for pending pods with no IP address in network fault (#684)
Signed-off-by: smit thakkar <smit.thakkar@deliveryhero.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-03-01 15:06:07 +05:30
Nageshbansal b9f897be21
Adds support for tolerations in source cmd probe (#681)
Signed-off-by: nagesh bansal <nageshbansal59@gmail.com>
2024-03-01 14:51:55 +05:30
Michael Morris c2f8f79ab9
Fix consider appKind when filtering target pods (#680)
* Fix consider appKind when filtering target pods

Signed-off-by: MichaelMorris <michael.morris@est.tech>

* Implemted review comment

Signed-off-by: MichaelMorris <michael.morris@est.tech>

---------

Signed-off-by: MichaelMorris <michael.morris@est.tech>
2024-03-01 14:41:29 +05:30
Nageshbansal 69927489d2
Fixes Probe logging for all iterations (#676)
* Fixes Probe logging for all iterations

Signed-off-by: nagesh bansal <nageshbansal59@gmail.com>
2024-01-11 17:48:26 +05:30
Shubham Chaudhary bdddd0d803
Add port blacklisting in the pod-network faults (#673)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-10-12 19:37:56 +05:30
Shubham Chaudhary 1b75f78632
fix(action): Fix the github release action (#672)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-09-29 16:02:01 +05:30
Calvinaud b710216113
Revert chaos when error during drain for node-drain experiments (#668)
- Added an call to uncordonNode in case of an error of the drainNode function

Signed-off-by: Calvin Audier <calvin.audier@gmail.com>
2023-09-21 23:54:33 +05:30
Shubham Chaudhary 392ea29800
chore(network): fix the destination ips for network experiment for service mesh (#666)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-09-15 11:00:34 +05:30
Shubham Chaudhary db13d05e28
Add fix to remove the job labels from helper pod (#665)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-07-24 13:09:57 +05:30
Vedant Shrotria d737281985
Merge pull request #661 from Jonsy13/group-optional-litmus-go
Upgrading chaos-operator version for making group optional in k8s probe
2023-06-05 13:05:51 +05:30
Jonsy13 61751a9404
Added changes for operator upgrade
Signed-off-by: Jonsy13 <vedant.shrotria@harness.io>
2023-06-05 12:34:12 +05:30
Shubham Chaudhary d4f9826ea9
chore(fields): Updating optional fields to pointer type (#658)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-25 14:02:22 +05:30
Shubham Chaudhary 3ab28a5110
run workflow on dispatch event and use token from secrets (#657)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 01:10:08 +05:30
Shubham Chaudhary 3005d02c24
use the official snyk action (#656)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 01:01:09 +05:30
Shubham Chaudhary 1971b8093b
fix the snyk token name (#655)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 00:35:26 +05:30
Shubham Chaudhary e5a831f713
fix the github workflow (#654)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 00:29:54 +05:30
Shubham Chaudhary 95c9602019
adding security scan workflow (#653)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 00:24:53 +05:30
Shubham Chaudhary f36b0761aa
adding security scan workflow (#652)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 00:21:19 +05:30
Shubham Chaudhary d3b760d76d
chore(unit): Adding units to the duration fields (#650)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-18 13:40:10 +05:30
Shubham Chaudhary 0bbe8e23e7
Revert "probe comparator logging for all iterations (#646)" (#649)
This reverts commit 8e0bbbbd5d.
2023-04-18 01:01:48 +05:30
Neelanjan Manna 5ade71c694
chore(probe): Update Probe failure descriptions and error codes (#648)
* adds probe description changes

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2023-04-17 17:24:23 +05:30
Shubham Chaudhary 8e0bbbbd5d
probe comparator logging for all iterations (#646)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-17 11:24:47 +05:30
Shubham Chaudhary d0b36e9a50
fix(probe): ProbeSuccessPercentage should not be 100% if experiment terminated with Error (#645)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-10 15:17:51 +05:30
Shubham Chaudhary eee4421c3c
chore(sdk): Updating the sdk to latest experiment schema (#644)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-03-20 17:01:46 +05:30
Neelanjan Manna a1c85ca52c
chore(experiments): Replaces default container runtime to containerd (#640)
* replaces default container runtime to containerd

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2023-03-14 19:41:02 +05:30
Shubham Chaudhary f8b370e6f4
add the experiment phase as completed with error (#642)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-03-09 21:52:17 +05:30
Neelanjan Manna 04c031a281
updates http probe wait duration to ms (#643)
Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2023-03-08 12:46:21 +05:30
Shubham Chaudhary ea2b83e1a0
adding backend compatibility to probe retry (#639)
* adding backend compatibility to probe retry

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* updating the chaos-operator version

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-02-22 10:03:56 +05:30
Shubham Chaudhary 291ae4a6ad
chore(error-verdict): Adding experiment verdict as error (#637)
* chore(error-verdict): Adding experiment verdict as error

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* updating error verdict

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* updating the chaos-operator version

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* adding comments and changing function name

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-02-21 23:37:56 +05:30
Akash Shrivastava 8b68c4b5cb
Added filtering vm instance by tag (#635)
Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
2023-02-15 16:48:47 +05:30
Shubham Chaudhary 7bdb18016f
chore(probe): updating retries to attempt and use the timout for per attempt timeout (#636)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-02-09 17:02:31 +05:30
Shubham Chaudhary 4aa778ef9c
chore(probe-timeout): converting probe timeout in milli seconds (#634)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-02-05 01:34:39 +05:30
Shubham Chaudhary 1f02800c23
chore(parallel): add support to create unique runid for same timestamp (#633)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-01-20 11:11:12 +05:30
Shubham Chaudhary 2134933c03
fix(stderr): adding the fix for cmd.Exec considers log.info as stderr (#632)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-01-10 21:58:02 +05:30
Shubham Chaudhary d151c8f1e0
chore(sidecar): adding sidecar to the helper pod (#630)
* chore(sidecar): adding sidecar to the helper pod

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* adding support for multiple sidecars

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* chore(sidecar): adding env and envFrom fields

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-01-10 12:58:57 +05:30
Shubham Chaudhary 3622f505c9
chore(probe): Adding the root cause into probe description (#628)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-01-09 15:15:14 +05:30
Shubham Chaudhary dc9283614b
chore(sdk): adding failstep and lib changes to sdk (#627)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-12-16 00:36:10 +05:30
Shubham Chaudhary 5eed28bf3f
fix(vulrn):fixing the security vulnerabilities (#617)
* fix(vulrn): fixing the security vulnerabilities

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-12-15 17:22:13 +05:30
Shubham Chaudhary 77b30e221e
(chore): Adding user-friendly failsteps and removing non-litmus libs (#626)
* feat(failstep):  Adding failstep in all experiment and removed non-litmus libs

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-12-15 16:42:27 +05:30
Neelanjan Manna eb98d50855
fix(gcp-label-experiments): Fix label filtering logic (#593)
* fix(gcp-label-experiments): fix label filter logic

Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>
2022-11-24 19:27:46 +05:30
Akash Shrivastava 3e72bb14e9
changed dd to use nsenter (#605)
Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>

Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-24 11:02:36 +05:30
Shubham Chaudhary 115ec45339
fix(pod-delete): fixing pod-delete experiment and refactor workload utils (#610)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-22 17:29:33 +05:30
Shubham Chaudhary 0e18911da6
chore(spring-boot): add spring-boot all faults option and remove duplicate code (#609)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-21 23:39:32 +05:30
Shubham Chaudhary e1eb389edf
Adding single helper and selectors changes to master (#608)
* feat(helper): adding single helper per node


Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-21 22:58:46 +05:30
Akash Shrivastava 39bbdbbf44
assigned msg var (#606)
Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>

Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
2022-11-18 14:14:57 +05:30
Shubham Chaudhary ff285178d5
chore(spring-boot): simplifying spring boot experiments env (#604)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-18 11:34:41 +05:30
Soumya Ghosh Dastidar f16249f802
feat: add resource name filtering in k8s probe (#598)
* feat: add resource name filtering in k8s probe

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>
2022-11-14 12:49:55 +05:30
Shubham Chaudhary 21969543bf
chore(spring-boot): spliting spring-boot-chaos experiment to separate experiments (#594)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-14 11:30:41 +05:30
Shubham Chaudhary 7140565204
chore(sudo): fixing sudo command (#595)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-07 21:03:09 +05:30
342 changed files with 13024 additions and 11634 deletions

View File

@ -12,19 +12,12 @@ jobs:
# Install golang # Install golang
- uses: actions/setup-go@v2 - uses: actions/setup-go@v2
with: with:
go-version: 1.17 go-version: '1.20'
- uses: actions/checkout@v2 - uses: actions/checkout@v2
with: with:
ref: ${{ github.event.pull_request.head.sha }} ref: ${{ github.event.pull_request.head.sha }}
#TODO: Add Dockerfile linting
# Running go-lint
- name: Checking Go-Lint
run : |
sudo apt-get update && sudo apt-get install golint
make gotasks
- name: gofmt check - name: gofmt check
run: | run: |
if [ "$(gofmt -s -l . | wc -l)" -ne 0 ] if [ "$(gofmt -s -l . | wc -l)" -ne 0 ]
@ -33,25 +26,21 @@ jobs:
gofmt -s -l . gofmt -s -l .
exit 1 exit 1
fi fi
- name: golangci-lint
uses: reviewdog/action-golangci-lint@v1
security: - name: golangci-lint
container: uses: reviewdog/action-golangci-lint@v1
image: litmuschaos/snyk:1.0
volumes: gitleaks-scan:
- /home/runner/work/_actions/:/home/runner/work/_actions/
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/checkout@v2 - uses: actions/checkout@v3
- uses: snyk/actions/setup@master
- run: snyk auth ${SNYK_TOKEN}
- uses: actions/setup-go@v1
with: with:
go-version: '1.17' fetch-depth: 0
- name: Snyk monitor - name: Run GitLeaks
run: snyk test run: |
wget https://github.com/gitleaks/gitleaks/releases/download/v8.18.2/gitleaks_8.18.2_linux_x64.tar.gz && \
tar -zxvf gitleaks_8.18.2_linux_x64.tar.gz && \
sudo mv gitleaks /usr/local/bin && gitleaks detect --source . -v
build: build:
needs: pre-checks needs: pre-checks
@ -60,7 +49,7 @@ jobs:
# Install golang # Install golang
- uses: actions/setup-go@v2 - uses: actions/setup-go@v2
with: with:
go-version: 1.17 go-version: '1.20'
- uses: actions/checkout@v2 - uses: actions/checkout@v2
with: with:
@ -84,6 +73,7 @@ jobs:
file: build/Dockerfile file: build/Dockerfile
platforms: linux/amd64,linux/arm64 platforms: linux/amd64,linux/arm64
tags: litmuschaos/go-runner:ci tags: litmuschaos/go-runner:ci
build-args: LITMUS_VERSION=3.10.0
trivy: trivy:
needs: pre-checks needs: pre-checks
@ -95,8 +85,8 @@ jobs:
- name: Build an image from Dockerfile - name: Build an image from Dockerfile
run: | run: |
docker build -f build/Dockerfile -t docker.io/litmuschaos/go-runner:${{ github.sha }} . --build-arg TARGETARCH=amd64 docker build -f build/Dockerfile -t docker.io/litmuschaos/go-runner:${{ github.sha }} . --build-arg TARGETARCH=amd64 --build-arg LITMUS_VERSION=3.10.0
- name: Run Trivy vulnerability scanner - name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master uses: aquasecurity/trivy-action@master
with: with:
@ -105,4 +95,4 @@ jobs:
exit-code: '1' exit-code: '1'
ignore-unfixed: true ignore-unfixed: true
vuln-type: 'os,library' vuln-type: 'os,library'
severity: 'CRITICAL,HIGH' severity: 'CRITICAL,HIGH'

View File

@ -13,16 +13,9 @@ jobs:
# Install golang # Install golang
- uses: actions/setup-go@v2 - uses: actions/setup-go@v2
with: with:
go-version: 1.17 go-version: '1.20'
- uses: actions/checkout@v2 - uses: actions/checkout@v2
#TODO: Add Dockerfile linting
# Running go-lint
- name: Checking Go-Lint
run : |
sudo apt-get update && sudo apt-get install golint
make gotasks
- name: gofmt check - name: gofmt check
run: | run: |
if [ "$(gofmt -s -l . | wc -l)" -ne 0 ] if [ "$(gofmt -s -l . | wc -l)" -ne 0 ]
@ -31,9 +24,9 @@ jobs:
gofmt -s -l . gofmt -s -l .
exit 1 exit 1
fi fi
- name: golangci-lint - name: golangci-lint
uses: reviewdog/action-golangci-lint@v1 uses: reviewdog/action-golangci-lint@v1
push: push:
needs: pre-checks needs: pre-checks
@ -43,7 +36,7 @@ jobs:
# Install golang # Install golang
- uses: actions/setup-go@v2 - uses: actions/setup-go@v2
with: with:
go-version: 1.17 go-version: '1.20'
- uses: actions/checkout@v2 - uses: actions/checkout@v2
- name: Set up QEMU - name: Set up QEMU
@ -70,3 +63,4 @@ jobs:
file: build/Dockerfile file: build/Dockerfile
platforms: linux/amd64,linux/arm64 platforms: linux/amd64,linux/arm64
tags: litmuschaos/go-runner:ci tags: litmuschaos/go-runner:ci
build-args: LITMUS_VERSION=3.10.0

View File

@ -8,29 +8,21 @@ on:
jobs: jobs:
pre-checks: pre-checks:
runs-on: ubuntu-latest runs-on: ubuntu-latest
if: ${{ startsWith(github.ref, 'refs/tags/') }}
steps: steps:
# Install golang # Install golang
- uses: actions/setup-go@v2 - uses: actions/setup-go@v2
with: with:
go-version: 1.17 go-version: '1.20'
- uses: actions/checkout@v2 - uses: actions/checkout@v2
#TODO: Add Dockerfile linting
# Running go-lint
- name: Checking Go-Lint
run : |
sudo apt-get update && sudo apt-get install golint
make gotasks
push: push:
needs: pre-checks needs: pre-checks
if: ${{ startsWith(github.ref, 'refs/tags/') }}
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
# Install golang # Install golang
- uses: actions/setup-go@v2 - uses: actions/setup-go@v2
with: with:
go-version: 1.17 go-version: '1.20'
- uses: actions/checkout@v2 - uses: actions/checkout@v2
- name: Set Tag - name: Set Tag
@ -43,7 +35,7 @@ jobs:
run: | run: |
echo "RELEASE TAG: ${RELEASE_TAG}" echo "RELEASE TAG: ${RELEASE_TAG}"
echo "${RELEASE_TAG}" > ${{ github.workspace }}/tag.txt echo "${RELEASE_TAG}" > ${{ github.workspace }}/tag.txt
- name: Set up QEMU - name: Set up QEMU
uses: docker/setup-qemu-action@v1 uses: docker/setup-qemu-action@v1
with: with:
@ -63,10 +55,11 @@ jobs:
- name: Build and push - name: Build and push
uses: docker/build-push-action@v2 uses: docker/build-push-action@v2
env: env:
RELEASE_TAG: ${{ env.RELEASE_TAG }} RELEASE_TAG: ${{ env.RELEASE_TAG }}
with: with:
push: true push: true
file: build/Dockerfile file: build/Dockerfile
platforms: linux/amd64,linux/arm64 platforms: linux/amd64,linux/arm64
tags: litmuschaos/go-runner:${{ env.RELEASE_TAG }},litmuschaos/go-runner:latest tags: litmuschaos/go-runner:${{ env.RELEASE_TAG }},litmuschaos/go-runner:latest
build-args: LITMUS_VERSION=3.10.0

View File

@ -9,215 +9,15 @@ on:
- '**.yaml' - '**.yaml'
jobs: jobs:
# Helm_Install_Generic_Tests:
# runs-on: ubuntu-18.04
# steps:
# - uses: actions/checkout@v2
# with:
# ref: ${{ github.event.pull_request.head.sha }}
# - name: Generate go binary and build docker image
# run: make build-amd64
# #Install and configure a kind cluster
# - name: Installing KinD cluster for the test
# uses: engineerd/setup-kind@v0.5.0
# with:
# version: "v0.7.0"
# config: "build/kind-cluster/kind-config.yaml"
# - name: Configuring and testing the Installation
# run: |
# kubectl taint nodes kind-control-plane node-role.kubernetes.io/master-
# kind get kubeconfig --internal >$HOME/.kube/config
# kubectl cluster-info --context kind-kind
# kubectl get nodes
# - name: Load docker image
# run: /usr/local/bin/kind load docker-image litmuschaos/go-runner:ci
# - name: Deploy a sample application for chaos injection
# run: |
# kubectl apply -f https://raw.githubusercontent.com/litmuschaos/chaos-ci-lib/master/app/nginx.yml
# kubectl wait --for=condition=Ready pods --all --namespace default --timeout=90s
# - name: Setting up kubeconfig ENV for Github Chaos Action
# run: echo ::set-env name=KUBE_CONFIG_DATA::$(base64 -w 0 ~/.kube/config)
# env:
# ACTIONS_ALLOW_UNSECURE_COMMANDS: true
# - name: Setup Litmus
# uses: litmuschaos/github-chaos-actions@master
# env:
# INSTALL_LITMUS: true
# - name: Running Litmus pod delete chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-delete
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# - name: Running container kill chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: container-kill
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# CONTAINER_RUNTIME: containerd
# - name: Running node-cpu-hog chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: node-cpu-hog
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# - name: Running node-memory-hog chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: node-memory-hog
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# - name: Running pod-cpu-hog chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-cpu-hog
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# CPU_CORES: 1
# - name: Running pod-memory-hog chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-memory-hog
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# MEMORY_CONSUMPTION: 500
# - name: Running pod network corruption chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-network-corruption
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# NETWORK_INTERFACE: eth0
# CONTAINER_RUNTIME: containerd
# - name: Running pod network duplication chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-network-duplication
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# NETWORK_INTERFACE: eth0
# CONTAINER_RUNTIME: containerd
# - name: Running pod-network-latency chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-network-latency
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# NETWORK_INTERFACE: eth0
# NETWORK_LATENCY: 60000
# CONTAINER_RUNTIME: containerd
# - name: Running pod-network-loss chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-network-loss
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# NETWORK_INTERFACE: eth0
# NETWORK_PACKET_LOSS_PERCENTAGE: 100
# CONTAINER_RUNTIME: containerd
# - name: Running pod autoscaler chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-autoscaler
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TOTAL_CHAOS_DURATION: 60
# - name: Running node-io-stress chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: node-io-stress
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TOTAL_CHAOS_DURATION: 120
# FILESYSTEM_UTILIZATION_PERCENTAGE: 10
# - name: Uninstall Litmus
# uses: litmuschaos/github-chaos-actions@master
# env:
# LITMUS_CLEANUP: true
# - name: Deleting KinD cluster
# if: always()
# run: kind delete cluster
Pod_Level_In_Serial_Mode: Pod_Level_In_Serial_Mode:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
# Install golang # Install golang
- uses: actions/setup-go@v2 - uses: actions/setup-go@v5
with: with:
go-version: '1.17' go-version: '1.20'
- uses: actions/checkout@v2 - uses: actions/checkout@v2
with: with:
@ -226,94 +26,16 @@ jobs:
- name: Generating Go binary and Building docker image - name: Generating Go binary and Building docker image
run: | run: |
make build-amd64 make build-amd64
#Install and configure a kind cluster
- name: Installing Prerequisites (K3S Cluster)
env:
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
run: |
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.21.11+k3s1 sh -s - --docker --write-kubeconfig-mode 664
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- uses: actions/checkout@v2
with:
repository: 'litmuschaos/litmus-e2e'
ref: 'master'
- name: Running Pod level experiment with affected percentage 100 and in series mode
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
run: |
make build-litmus
make app-deploy
make pod-affected-perc-ton-series
- name: Deleting K3S cluster
if: always()
run: /usr/local/bin/k3s-uninstall.sh
Pod_Level_In_Parallel_Mode: - name: Install KinD
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '1.17'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Generating Go binary and Building docker image
run: | run: |
make build-amd64 curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
#Install and configure a kind cluster chmod +x ./kind
- name: Installing Prerequisites (K3S Cluster) mv ./kind /usr/local/bin/kind
env:
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
run: |
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.21.11+k3s1 sh -s - --docker --write-kubeconfig-mode 664
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- uses: actions/checkout@v2
with:
repository: 'litmuschaos/litmus-e2e'
ref: 'master'
- name: Running Pod level experiment with affected percentage 100 and in parallel mode
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
run: |
make build-litmus
make app-deploy
make pod-affected-perc-ton-parallel
- name: Deleting K3S cluster
if: always()
run: /usr/local/bin/k3s-uninstall.sh
Node_Level_Tests:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '1.17'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Generating Go binary and Building docker image
run: |
make build-amd64
- name: Create KinD Cluster - name: Create KinD Cluster
run: kind create cluster --config build/kind-cluster/kind-config.yaml run: |
kind create cluster --config build/kind-cluster/kind-config.yaml
- name: Configuring and testing the Installation - name: Configuring and testing the Installation
run: | run: |
@ -324,7 +46,123 @@ jobs:
- name: Load image on the nodes of the cluster - name: Load image on the nodes of the cluster
run: | run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci kind load docker-image --name=kind litmuschaos/go-runner:ci
- uses: actions/checkout@v2
with:
repository: 'litmuschaos/litmus-e2e'
ref: 'master'
- name: Running Pod level experiment with affected percentage 100 and in series mode
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /home/runner/.kube/config
run: |
make build-litmus
make app-deploy
make pod-affected-perc-ton-series
- name: Deleting KinD cluster
if: always()
run: kind delete cluster
Pod_Level_In_Parallel_Mode:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v5
with:
go-version: '1.20'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Generating Go binary and Building docker image
run: |
make build-amd64
- name: Install KinD
run: |
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
mv ./kind /usr/local/bin/kind
- name: Create KinD Cluster
run: |
kind create cluster --config build/kind-cluster/kind-config.yaml
- name: Configuring and testing the Installation
env:
KUBECONFIG: /home/runner/.kube/config
run: |
kubectl taint nodes kind-control-plane node-role.kubernetes.io/control-plane-
kubectl cluster-info --context kind-kind
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
- uses: actions/checkout@v2
with:
repository: 'litmuschaos/litmus-e2e'
ref: 'master'
- name: Running Pod level experiment with affected percentage 100 and in parallel mode
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /home/runner/.kube/config
run: |
make build-litmus
make app-deploy
make pod-affected-perc-ton-parallel
- name: Deleting KinD cluster
if: always()
run: kind delete cluster
Node_Level_Tests:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v5
with:
go-version: '1.20'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Generating Go binary and Building docker image
run: |
make build-amd64
- name: Install KinD
run: |
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
mv ./kind /usr/local/bin/kind
- name: Create KinD Cluster
run: |
kind create cluster --config build/kind-cluster/kind-config.yaml
- name: Configuring and testing the Installation
run: |
kubectl taint nodes kind-control-plane node-role.kubernetes.io/control-plane-
kubectl cluster-info --context kind-kind
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
- uses: actions/checkout@v2 - uses: actions/checkout@v2
with: with:
@ -355,4 +193,6 @@ jobs:
- name: Deleting KinD cluster - name: Deleting KinD cluster
if: always() if: always()
run: kind delete cluster run: |
kubectl get nodes
kind delete cluster

27
.github/workflows/security-scan.yml vendored Normal file
View File

@ -0,0 +1,27 @@
---
name: Security Scan
on:
workflow_dispatch:
jobs:
trivy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Build an image from Dockerfile
run: |
docker build -f build/Dockerfile -t docker.io/litmuschaos/go-runner:${{ github.sha }} . --build-arg TARGETARCH=amd64 --build-arg LITMUS_VERSION=3.9.0
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: 'docker.io/litmuschaos/go-runner:${{ github.sha }}'
format: 'table'
exit-code: '1'
ignore-unfixed: true
vuln-type: 'os,library'
severity: 'CRITICAL,HIGH'

View File

@ -31,7 +31,7 @@ deps: _build_check_docker
_build_check_docker: _build_check_docker:
@echo "------------------" @echo "------------------"
@echo "--> Check the Docker deps" @echo "--> Check the Docker deps"
@echo "------------------" @echo "------------------"
@if [ $(IS_DOCKER_INSTALLED) -eq 1 ]; \ @if [ $(IS_DOCKER_INSTALLED) -eq 1 ]; \
then echo "" \ then echo "" \
@ -56,7 +56,7 @@ unused-package-check:
.PHONY: docker.buildx .PHONY: docker.buildx
docker.buildx: docker.buildx:
@echo "------------------------------" @echo "------------------------------"
@echo "--> Setting up Builder " @echo "--> Setting up Builder "
@echo "------------------------------" @echo "------------------------------"
@if ! docker buildx ls | grep -q multibuilder; then\ @if ! docker buildx ls | grep -q multibuilder; then\
docker buildx create --name multibuilder;\ docker buildx create --name multibuilder;\
@ -69,27 +69,27 @@ push: docker.buildx image-push
image-push: image-push:
@echo "------------------------" @echo "------------------------"
@echo "--> Push go-runner image" @echo "--> Push go-runner image"
@echo "------------------------" @echo "------------------------"
@echo "Pushing $(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)" @echo "Pushing $(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)"
@docker buildx build . --push --file build/Dockerfile --progress plane --platform linux/arm64,linux/amd64 --no-cache --tag $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG) @docker buildx build . --push --file build/Dockerfile --progress plain --platform linux/arm64,linux/amd64 --no-cache --tag $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
.PHONY: build-amd64 .PHONY: build-amd64
build-amd64: build-amd64:
@echo "-------------------------" @echo "-------------------------"
@echo "--> Build go-runner image" @echo "--> Build go-runner image"
@echo "-------------------------" @echo "-------------------------"
@sudo docker build --file build/Dockerfile --tag $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG) . --build-arg TARGETARCH=amd64 @sudo docker build --file build/Dockerfile --tag $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG) . --build-arg TARGETARCH=amd64 --build-arg LITMUS_VERSION=3.9.0
.PHONY: push-amd64 .PHONY: push-amd64
push-amd64: push-amd64:
@echo "------------------------------" @echo "------------------------------"
@echo "--> Pushing image" @echo "--> Pushing image"
@echo "------------------------------" @echo "------------------------------"
@sudo docker push $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG) @sudo docker push $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
.PHONY: trivy-check .PHONY: trivy-check
trivy-check: trivy-check:

View File

@ -1,7 +1,11 @@
package main package main
import ( import (
"context"
"errors"
"flag" "flag"
"os"
// Uncomment to load all auth plugins // Uncomment to load all auth plugins
// _ "k8s.io/client-go/plugin/pkg/client/auth" // _ "k8s.io/client-go/plugin/pkg/client/auth"
@ -11,6 +15,8 @@ import (
// _ "k8s.io/client-go/plugin/pkg/client/auth/oidc" // _ "k8s.io/client-go/plugin/pkg/client/auth/oidc"
// _ "k8s.io/client-go/plugin/pkg/client/auth/openstack" // _ "k8s.io/client-go/plugin/pkg/client/auth/openstack"
"go.opentelemetry.io/otel"
awsSSMChaosByID "github.com/litmuschaos/litmus-go/experiments/aws-ssm/aws-ssm-chaos-by-id/experiment" awsSSMChaosByID "github.com/litmuschaos/litmus-go/experiments/aws-ssm/aws-ssm-chaos-by-id/experiment"
awsSSMChaosByTag "github.com/litmuschaos/litmus-go/experiments/aws-ssm/aws-ssm-chaos-by-tag/experiment" awsSSMChaosByTag "github.com/litmuschaos/litmus-go/experiments/aws-ssm/aws-ssm-chaos-by-tag/experiment"
azureDiskLoss "github.com/litmuschaos/litmus-go/experiments/azure/azure-disk-loss/experiment" azureDiskLoss "github.com/litmuschaos/litmus-go/experiments/azure/azure-disk-loss/experiment"
@ -51,16 +57,19 @@ import (
podNetworkLatency "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-latency/experiment" podNetworkLatency "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-latency/experiment"
podNetworkLoss "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-loss/experiment" podNetworkLoss "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-loss/experiment"
podNetworkPartition "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-partition/experiment" podNetworkPartition "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-partition/experiment"
podNetworkRateLimit "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-rate-limit/experiment"
kafkaBrokerPodFailure "github.com/litmuschaos/litmus-go/experiments/kafka/kafka-broker-pod-failure/experiment" kafkaBrokerPodFailure "github.com/litmuschaos/litmus-go/experiments/kafka/kafka-broker-pod-failure/experiment"
ebsLossByID "github.com/litmuschaos/litmus-go/experiments/kube-aws/ebs-loss-by-id/experiment" ebsLossByID "github.com/litmuschaos/litmus-go/experiments/kube-aws/ebs-loss-by-id/experiment"
ebsLossByTag "github.com/litmuschaos/litmus-go/experiments/kube-aws/ebs-loss-by-tag/experiment" ebsLossByTag "github.com/litmuschaos/litmus-go/experiments/kube-aws/ebs-loss-by-tag/experiment"
ec2TerminateByID "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate-by-id/experiment" ec2TerminateByID "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate-by-id/experiment"
ec2TerminateByTag "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate-by-tag/experiment" ec2TerminateByTag "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate-by-tag/experiment"
springBootChaos "github.com/litmuschaos/litmus-go/experiments/spring-boot/spring-boot-chaos/experiment" rdsInstanceStop "github.com/litmuschaos/litmus-go/experiments/kube-aws/rds-instance-stop/experiment"
k6Loadgen "github.com/litmuschaos/litmus-go/experiments/load/k6-loadgen/experiment"
springBootFaults "github.com/litmuschaos/litmus-go/experiments/spring-boot/spring-boot-faults/experiment"
vmpoweroff "github.com/litmuschaos/litmus-go/experiments/vmware/vm-poweroff/experiment" vmpoweroff "github.com/litmuschaos/litmus-go/experiments/vmware/vm-poweroff/experiment"
cli "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
) )
@ -74,8 +83,25 @@ func init() {
} }
func main() { func main() {
initCtx := context.Background()
clients := clients.ClientSets{} // Set up Observability.
if otelExporterEndpoint := os.Getenv(telemetry.OTELExporterOTLPEndpoint); otelExporterEndpoint != "" {
shutdown, err := telemetry.InitOTelSDK(initCtx, true, otelExporterEndpoint)
if err != nil {
log.Errorf("Failed to initialize OTel SDK: %v", err)
return
}
defer func() {
err = errors.Join(err, shutdown(initCtx))
}()
initCtx = telemetry.GetTraceParentContext()
}
clients := cli.ClientSets{}
ctx, span := otel.Tracer(telemetry.TracerName).Start(initCtx, "ExecuteExperiment")
defer span.End()
// parse the experiment name // parse the experiment name
experimentName := flag.String("name", "pod-delete", "name of the chaos experiment") experimentName := flag.String("name", "pod-delete", "name of the chaos experiment")
@ -88,102 +114,108 @@ func main() {
log.Infof("Experiment Name: %v", *experimentName) log.Infof("Experiment Name: %v", *experimentName)
// invoke the corresponding experiment based on the the (-name) flag // invoke the corresponding experiment based on the (-name) flag
switch *experimentName { switch *experimentName {
case "container-kill": case "container-kill":
containerKill.ContainerKill(clients) containerKill.ContainerKill(ctx, clients)
case "disk-fill": case "disk-fill":
diskFill.DiskFill(clients) diskFill.DiskFill(ctx, clients)
case "kafka-broker-pod-failure": case "kafka-broker-pod-failure":
kafkaBrokerPodFailure.KafkaBrokerPodFailure(clients) kafkaBrokerPodFailure.KafkaBrokerPodFailure(ctx, clients)
case "kubelet-service-kill": case "kubelet-service-kill":
kubeletServiceKill.KubeletServiceKill(clients) kubeletServiceKill.KubeletServiceKill(ctx, clients)
case "docker-service-kill": case "docker-service-kill":
dockerServiceKill.DockerServiceKill(clients) dockerServiceKill.DockerServiceKill(ctx, clients)
case "node-cpu-hog": case "node-cpu-hog":
nodeCPUHog.NodeCPUHog(clients) nodeCPUHog.NodeCPUHog(ctx, clients)
case "node-drain": case "node-drain":
nodeDrain.NodeDrain(clients) nodeDrain.NodeDrain(ctx, clients)
case "node-io-stress": case "node-io-stress":
nodeIOStress.NodeIOStress(clients) nodeIOStress.NodeIOStress(ctx, clients)
case "node-memory-hog": case "node-memory-hog":
nodeMemoryHog.NodeMemoryHog(clients) nodeMemoryHog.NodeMemoryHog(ctx, clients)
case "node-taint": case "node-taint":
nodeTaint.NodeTaint(clients) nodeTaint.NodeTaint(ctx, clients)
case "pod-autoscaler": case "pod-autoscaler":
podAutoscaler.PodAutoscaler(clients) podAutoscaler.PodAutoscaler(ctx, clients)
case "pod-cpu-hog-exec": case "pod-cpu-hog-exec":
podCPUHogExec.PodCPUHogExec(clients) podCPUHogExec.PodCPUHogExec(ctx, clients)
case "pod-delete": case "pod-delete":
podDelete.PodDelete(clients) podDelete.PodDelete(ctx, clients)
case "pod-io-stress": case "pod-io-stress":
podIOStress.PodIOStress(clients) podIOStress.PodIOStress(ctx, clients)
case "pod-memory-hog-exec": case "pod-memory-hog-exec":
podMemoryHogExec.PodMemoryHogExec(clients) podMemoryHogExec.PodMemoryHogExec(ctx, clients)
case "pod-network-corruption": case "pod-network-corruption":
podNetworkCorruption.PodNetworkCorruption(clients) podNetworkCorruption.PodNetworkCorruption(ctx, clients)
case "pod-network-duplication": case "pod-network-duplication":
podNetworkDuplication.PodNetworkDuplication(clients) podNetworkDuplication.PodNetworkDuplication(ctx, clients)
case "pod-network-latency": case "pod-network-latency":
podNetworkLatency.PodNetworkLatency(clients) podNetworkLatency.PodNetworkLatency(ctx, clients)
case "pod-network-loss": case "pod-network-loss":
podNetworkLoss.PodNetworkLoss(clients) podNetworkLoss.PodNetworkLoss(ctx, clients)
case "pod-network-partition": case "pod-network-partition":
podNetworkPartition.PodNetworkPartition(clients) podNetworkPartition.PodNetworkPartition(ctx, clients)
case "pod-network-rate-limit":
podNetworkRateLimit.PodNetworkRateLimit(ctx, clients)
case "pod-memory-hog": case "pod-memory-hog":
podMemoryHog.PodMemoryHog(clients) podMemoryHog.PodMemoryHog(ctx, clients)
case "pod-cpu-hog": case "pod-cpu-hog":
podCPUHog.PodCPUHog(clients) podCPUHog.PodCPUHog(ctx, clients)
case "cassandra-pod-delete": case "cassandra-pod-delete":
cassandraPodDelete.CasssandraPodDelete(clients) cassandraPodDelete.CasssandraPodDelete(ctx, clients)
case "aws-ssm-chaos-by-id": case "aws-ssm-chaos-by-id":
awsSSMChaosByID.AWSSSMChaosByID(clients) awsSSMChaosByID.AWSSSMChaosByID(ctx, clients)
case "aws-ssm-chaos-by-tag": case "aws-ssm-chaos-by-tag":
awsSSMChaosByTag.AWSSSMChaosByTag(clients) awsSSMChaosByTag.AWSSSMChaosByTag(ctx, clients)
case "ec2-terminate-by-id": case "ec2-terminate-by-id":
ec2TerminateByID.EC2TerminateByID(clients) ec2TerminateByID.EC2TerminateByID(ctx, clients)
case "ec2-terminate-by-tag": case "ec2-terminate-by-tag":
ec2TerminateByTag.EC2TerminateByTag(clients) ec2TerminateByTag.EC2TerminateByTag(ctx, clients)
case "ebs-loss-by-id": case "ebs-loss-by-id":
ebsLossByID.EBSLossByID(clients) ebsLossByID.EBSLossByID(ctx, clients)
case "ebs-loss-by-tag": case "ebs-loss-by-tag":
ebsLossByTag.EBSLossByTag(clients) ebsLossByTag.EBSLossByTag(ctx, clients)
case "rds-instance-stop":
rdsInstanceStop.RDSInstanceStop(ctx, clients)
case "node-restart": case "node-restart":
nodeRestart.NodeRestart(clients) nodeRestart.NodeRestart(ctx, clients)
case "pod-dns-error": case "pod-dns-error":
podDNSError.PodDNSError(clients) podDNSError.PodDNSError(ctx, clients)
case "pod-dns-spoof": case "pod-dns-spoof":
podDNSSpoof.PodDNSSpoof(clients) podDNSSpoof.PodDNSSpoof(ctx, clients)
case "pod-http-latency": case "pod-http-latency":
podHttpLatency.PodHttpLatency(clients) podHttpLatency.PodHttpLatency(ctx, clients)
case "pod-http-status-code": case "pod-http-status-code":
podHttpStatusCode.PodHttpStatusCode(clients) podHttpStatusCode.PodHttpStatusCode(ctx, clients)
case "pod-http-modify-header": case "pod-http-modify-header":
podHttpModifyHeader.PodHttpModifyHeader(clients) podHttpModifyHeader.PodHttpModifyHeader(ctx, clients)
case "pod-http-modify-body": case "pod-http-modify-body":
podHttpModifyBody.PodHttpModifyBody(clients) podHttpModifyBody.PodHttpModifyBody(ctx, clients)
case "pod-http-reset-peer": case "pod-http-reset-peer":
podHttpResetPeer.PodHttpResetPeer(clients) podHttpResetPeer.PodHttpResetPeer(ctx, clients)
case "vm-poweroff": case "vm-poweroff":
vmpoweroff.VMPoweroff(clients) vmpoweroff.VMPoweroff(ctx, clients)
case "azure-instance-stop": case "azure-instance-stop":
azureInstanceStop.AzureInstanceStop(clients) azureInstanceStop.AzureInstanceStop(ctx, clients)
case "azure-disk-loss": case "azure-disk-loss":
azureDiskLoss.AzureDiskLoss(clients) azureDiskLoss.AzureDiskLoss(ctx, clients)
case "gcp-vm-disk-loss": case "gcp-vm-disk-loss":
gcpVMDiskLoss.VMDiskLoss(clients) gcpVMDiskLoss.VMDiskLoss(ctx, clients)
case "pod-fio-stress": case "pod-fio-stress":
podFioStress.PodFioStress(clients) podFioStress.PodFioStress(ctx, clients)
case "gcp-vm-instance-stop": case "gcp-vm-instance-stop":
gcpVMInstanceStop.VMInstanceStop(clients) gcpVMInstanceStop.VMInstanceStop(ctx, clients)
case "redfish-node-restart": case "redfish-node-restart":
redfishNodeRestart.NodeRestart(clients) redfishNodeRestart.NodeRestart(ctx, clients)
case "gcp-vm-instance-stop-by-label": case "gcp-vm-instance-stop-by-label":
gcpVMInstanceStopByLabel.GCPVMInstanceStopByLabel(clients) gcpVMInstanceStopByLabel.GCPVMInstanceStopByLabel(ctx, clients)
case "gcp-vm-disk-loss-by-label": case "gcp-vm-disk-loss-by-label":
gcpVMDiskLossByLabel.GCPVMDiskLossByLabel(clients) gcpVMDiskLossByLabel.GCPVMDiskLossByLabel(ctx, clients)
case "spring-boot-chaos": case "spring-boot-cpu-stress", "spring-boot-memory-stress", "spring-boot-exceptions", "spring-boot-app-kill", "spring-boot-faults", "spring-boot-latency":
springBootChaos.Experiment(clients) springBootFaults.Experiment(ctx, clients, *experimentName)
case "k6-loadgen":
k6Loadgen.Experiment(ctx, clients)
default: default:
log.Errorf("Unsupported -name %v, please provide the correct value of -name args", *experimentName) log.Errorf("Unsupported -name %v, please provide the correct value of -name args", *experimentName)
return return

View File

@ -1,7 +1,11 @@
package main package main
import ( import (
"context"
"errors"
"flag" "flag"
"os"
// Uncomment to load all auth plugins // Uncomment to load all auth plugins
// _ "k8s.io/client-go/plugin/pkg/client/auth" // _ "k8s.io/client-go/plugin/pkg/client/auth"
@ -17,10 +21,11 @@ import (
networkChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/helper" networkChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/helper"
dnsChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/pod-dns-chaos/helper" dnsChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/pod-dns-chaos/helper"
stressChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/stress-chaos/helper" stressChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/stress-chaos/helper"
cli "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
) )
func init() { func init() {
@ -33,8 +38,24 @@ func init() {
} }
func main() { func main() {
ctx := context.Background()
// Set up Observability.
if otelExporterEndpoint := os.Getenv(telemetry.OTELExporterOTLPEndpoint); otelExporterEndpoint != "" {
shutdown, err := telemetry.InitOTelSDK(ctx, true, otelExporterEndpoint)
if err != nil {
log.Errorf("Failed to initialize OTel SDK: %v", err)
return
}
defer func() {
err = errors.Join(err, shutdown(ctx))
}()
ctx = telemetry.GetTraceParentContext()
}
clients := clients.ClientSets{} clients := cli.ClientSets{}
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "ExecuteExperimentHelper")
defer span.End()
// parse the helper name // parse the helper name
helperName := flag.String("name", "", "name of the helper pod") helperName := flag.String("name", "", "name of the helper pod")
@ -50,17 +71,17 @@ func main() {
// invoke the corresponding helper based on the the (-name) flag // invoke the corresponding helper based on the the (-name) flag
switch *helperName { switch *helperName {
case "container-kill": case "container-kill":
containerKill.Helper(clients) containerKill.Helper(ctx, clients)
case "disk-fill": case "disk-fill":
diskFill.Helper(clients) diskFill.Helper(ctx, clients)
case "dns-chaos": case "dns-chaos":
dnsChaos.Helper(clients) dnsChaos.Helper(ctx, clients)
case "stress-chaos": case "stress-chaos":
stressChaos.Helper(clients) stressChaos.Helper(ctx, clients)
case "network-chaos": case "network-chaos":
networkChaos.Helper(clients) networkChaos.Helper(ctx, clients)
case "http-chaos": case "http-chaos":
httpChaos.Helper(clients) httpChaos.Helper(ctx, clients)
default: default:
log.Errorf("Unsupported -name %v, please provide the correct value of -name args", *helperName) log.Errorf("Unsupported -name %v, please provide the correct value of -name args", *helperName)

View File

@ -1,6 +1,6 @@
# Multi-stage docker build # Multi-stage docker build
# Build stage # Build stage
FROM golang:1.17 AS builder FROM golang:1.22 AS builder
ARG TARGETOS=linux ARG TARGETOS=linux
ARG TARGETARCH ARG TARGETARCH
@ -14,27 +14,99 @@ RUN export GOOS=${TARGETOS} && \
RUN CGO_ENABLED=0 go build -o /output/experiments ./bin/experiment RUN CGO_ENABLED=0 go build -o /output/experiments ./bin/experiment
RUN CGO_ENABLED=0 go build -o /output/helpers ./bin/helper RUN CGO_ENABLED=0 go build -o /output/helpers ./bin/helper
FROM alpine:3.15.0 AS dep
# Install generally useful things
RUN apk --update add \
sudo \
iproute2 \
iptables
# Packaging stage # Packaging stage
# Image source: https://github.com/litmuschaos/test-tools/blob/master/custom/hardened-alpine/experiment/Dockerfile FROM registry.access.redhat.com/ubi9/ubi:9.4
# The base image is non-root (have litmus user) with default litmus directory.
FROM litmuschaos/experiment-alpine:2.14.0
LABEL maintainer="LitmusChaos" LABEL maintainer="LitmusChaos"
COPY --from=builder /output/ /litmus ARG TARGETARCH
COPY --from=dep /usr/bin/sudo /usr/bin/ ARG LITMUS_VERSION
COPY --from=dep /usr/lib/sudo /usr/lib/sudo
COPY --from=dep /sbin/tc /sbin/
COPY --from=dep /sbin/iptables /sbin/
#Copying Necessary Files # Install generally useful things
COPY ./pkg/cloud/aws/common/ssm-docs/LitmusChaos-AWS-SSM-Docs.yml . RUN yum install -y \
sudo \
sshpass \
procps \
openssh-clients
# tc binary
RUN yum install -y https://dl.rockylinux.org/vault/rocky/9.3/devel/$(uname -m)/os/Packages/i/iproute-6.2.0-5.el9.$(uname -m).rpm
RUN yum install -y https://dl.rockylinux.org/vault/rocky/9.3/devel/$(uname -m)/os/Packages/i/iproute-tc-6.2.0-5.el9.$(uname -m).rpm
# iptables
RUN yum install -y https://dl.rockylinux.org/vault/rocky/9.3/devel/$(uname -m)/os/Packages/i/iptables-libs-1.8.8-6.el9_1.$(uname -m).rpm
RUN yum install -y https://dl.fedoraproject.org/pub/archive/epel/9.3/Everything/$(uname -m)/Packages/i/iptables-legacy-libs-1.8.8-6.el9.2.$(uname -m).rpm
RUN yum install -y https://dl.fedoraproject.org/pub/archive/epel/9.3/Everything/$(uname -m)/Packages/i/iptables-legacy-1.8.8-6.el9.2.$(uname -m).rpm
# stress-ng
RUN yum install -y https://yum.oracle.com/repo/OracleLinux/OL9/appstream/$(uname -m)/getPackage/Judy-1.0.5-28.el9.$(uname -m).rpm
RUN yum install -y https://yum.oracle.com/repo/OracleLinux/OL9/appstream/$(uname -m)/getPackage/stress-ng-0.14.00-2.el9.$(uname -m).rpm
#Installing Kubectl
ENV KUBE_LATEST_VERSION="v1.31.0"
RUN curl -L https://storage.googleapis.com/kubernetes-release/release/${KUBE_LATEST_VERSION}/bin/linux/${TARGETARCH}/kubectl -o /usr/bin/kubectl && \
chmod 755 /usr/bin/kubectl
#Installing crictl binaries
RUN curl -L https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.31.1/crictl-v1.31.1-linux-${TARGETARCH}.tar.gz --output crictl-v1.31.1-linux-${TARGETARCH}.tar.gz && \
tar zxvf crictl-v1.31.1-linux-${TARGETARCH}.tar.gz -C /sbin && \
chmod 755 /sbin/crictl
#Installing promql cli binaries
RUN curl -L https://github.com/chaosnative/promql-cli/releases/download/3.0.0-beta6/promql_linux_${TARGETARCH} --output /usr/bin/promql && chmod 755 /usr/bin/promql
#Installing pause cli binaries
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/pause-linux-${TARGETARCH} --output /usr/bin/pause && chmod 755 /usr/bin/pause
#Installing dns_interceptor cli binaries
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/dns_interceptor --output /sbin/dns_interceptor && chmod 755 /sbin/dns_interceptor
#Installing nsutil cli binaries
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/nsutil-linux-${TARGETARCH} --output /sbin/nsutil && chmod 755 /sbin/nsutil
#Installing nsutil shared lib
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/nsutil_${TARGETARCH}.so --output /usr/local/lib/nsutil.so && chmod 755 /usr/local/lib/nsutil.so
# Installing toxiproxy binaries
RUN curl -L https://litmus-http-proxy.s3.amazonaws.com/cli/cli/toxiproxy-cli-linux-${TARGETARCH}.tar.gz --output toxiproxy-cli-linux-${TARGETARCH}.tar.gz && \
tar zxvf toxiproxy-cli-linux-${TARGETARCH}.tar.gz -C /sbin/ && \
chmod 755 /sbin/toxiproxy-cli
RUN curl -L https://litmus-http-proxy.s3.amazonaws.com/server/server/toxiproxy-server-linux-${TARGETARCH}.tar.gz --output toxiproxy-server-linux-${TARGETARCH}.tar.gz && \
tar zxvf toxiproxy-server-linux-${TARGETARCH}.tar.gz -C /sbin/ && \
chmod 755 /sbin/toxiproxy-server
ENV APP_USER=litmus
ENV APP_DIR="/$APP_USER"
ENV DATA_DIR="$APP_DIR/data"
# The USERD_ID of user
ENV APP_USER_ID=2000
RUN useradd -s /bin/true -u $APP_USER_ID -m -d $APP_DIR $APP_USER
# change to 0(root) group because openshift will run container with arbitrary uid as a member of root group
RUN chgrp -R 0 "$APP_DIR" && chmod -R g=u "$APP_DIR"
# Giving sudo to all users (required for almost all experiments)
RUN echo 'ALL ALL=(ALL:ALL) NOPASSWD: ALL' >> /etc/sudoers
WORKDIR $APP_DIR
COPY --from=builder /output/ .
COPY --from=docker:27.0.3 /usr/local/bin/docker /sbin/docker
RUN chmod 755 /sbin/docker
# Set permissions and ownership for the copied binaries
RUN chmod 755 ./experiments ./helpers && \
chown ${APP_USER}:0 ./experiments ./helpers
# Set ownership for binaries in /sbin and /usr/bin
RUN chown ${APP_USER}:0 /sbin/* /usr/bin/* && \
chown root:root /usr/bin/sudo && \
chmod 4755 /usr/bin/sudo
# Copying Necessary Files
COPY ./pkg/cloud/aws/common/ssm-docs/LitmusChaos-AWS-SSM-Docs.yml ./LitmusChaos-AWS-SSM-Docs.yml
RUN chown ${APP_USER}:0 ./LitmusChaos-AWS-SSM-Docs.yml && chmod 755 ./LitmusChaos-AWS-SSM-Docs.yml
USER ${APP_USER}

View File

@ -1,7 +1,6 @@
apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes: nodes:
- role: control-plane - role: control-plane
- role: worker - role: worker
- role: worker - role: worker
- role: worker

View File

@ -1,23 +1,28 @@
package lib package lib
import ( import (
"context"
"os" "os"
"strings" "strings"
"time" "time"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm" "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
) )
// InjectChaosInSerialMode will inject the aws ssm chaos in serial mode that is one after other // InjectChaosInSerialMode will inject the aws ssm chaos in serial mode that is one after other
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, inject chan os.Signal) error { func InjectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, inject chan os.Signal) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSSSMFaultInSerialMode")
defer span.End()
select { select {
case <-inject: case <-inject:
@ -46,7 +51,7 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
ec2IDList := strings.Fields(ec2ID) ec2IDList := strings.Fields(ec2ID)
commandId, err := ssm.SendSSMCommand(experimentsDetails, ec2IDList) commandId, err := ssm.SendSSMCommand(experimentsDetails, ec2IDList)
if err != nil { if err != nil {
return errors.Errorf("fail to send ssm command, err: %v", err) return stacktrace.Propagate(err, "failed to send ssm command")
} }
//prepare commands for abort recovery //prepare commands for abort recovery
experimentsDetails.CommandIDs = append(experimentsDetails.CommandIDs, commandId) experimentsDetails.CommandIDs = append(experimentsDetails.CommandIDs, commandId)
@ -54,21 +59,21 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//wait for the ssm command to get in running state //wait for the ssm command to get in running state
log.Info("[Wait]: Waiting for the ssm command to get in InProgress state") log.Info("[Wait]: Waiting for the ssm command to get in InProgress state")
if err := ssm.WaitForCommandStatus("InProgress", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil { if err := ssm.WaitForCommandStatus("InProgress", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return errors.Errorf("fail to start ssm command, err: %v", err) return stacktrace.Propagate(err, "failed to start ssm command")
} }
common.SetTargets(ec2ID, "injected", "EC2", chaosDetails) common.SetTargets(ec2ID, "injected", "EC2", chaosDetails)
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 && i == 0 { if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "failed to run probes")
} }
} }
//wait for the ssm command to get succeeded in the given chaos duration //wait for the ssm command to get succeeded in the given chaos duration
log.Info("[Wait]: Waiting for the ssm command to get completed") log.Info("[Wait]: Waiting for the ssm command to get completed")
if err := ssm.WaitForCommandStatus("Success", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil { if err := ssm.WaitForCommandStatus("Success", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return errors.Errorf("fail to send ssm command, err: %v", err) return stacktrace.Propagate(err, "failed to send ssm command")
} }
common.SetTargets(ec2ID, "reverted", "EC2", chaosDetails) common.SetTargets(ec2ID, "reverted", "EC2", chaosDetails)
@ -85,7 +90,9 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
// InjectChaosInParallelMode will inject the aws ssm chaos in parallel mode that is all at once // InjectChaosInParallelMode will inject the aws ssm chaos in parallel mode that is all at once
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, inject chan os.Signal) error { func InjectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, inject chan os.Signal) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSSSMFaultInParallelMode")
defer span.End()
select { select {
case <-inject: case <-inject:
@ -110,7 +117,7 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
log.Info("[Chaos]: Starting the ssm command") log.Info("[Chaos]: Starting the ssm command")
commandId, err := ssm.SendSSMCommand(experimentsDetails, instanceIDList) commandId, err := ssm.SendSSMCommand(experimentsDetails, instanceIDList)
if err != nil { if err != nil {
return errors.Errorf("fail to send ssm command, err: %v", err) return stacktrace.Propagate(err, "failed to send ssm command")
} }
//prepare commands for abort recovery //prepare commands for abort recovery
experimentsDetails.CommandIDs = append(experimentsDetails.CommandIDs, commandId) experimentsDetails.CommandIDs = append(experimentsDetails.CommandIDs, commandId)
@ -119,14 +126,14 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//wait for the ssm command to get in running state //wait for the ssm command to get in running state
log.Info("[Wait]: Waiting for the ssm command to get in InProgress state") log.Info("[Wait]: Waiting for the ssm command to get in InProgress state")
if err := ssm.WaitForCommandStatus("InProgress", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil { if err := ssm.WaitForCommandStatus("InProgress", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return errors.Errorf("fail to start ssm command, err: %v", err) return stacktrace.Propagate(err, "failed to start ssm command")
} }
} }
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "failed to run probes")
} }
} }
@ -134,7 +141,7 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//wait for the ssm command to get succeeded in the given chaos duration //wait for the ssm command to get succeeded in the given chaos duration
log.Info("[Wait]: Waiting for the ssm command to get completed") log.Info("[Wait]: Waiting for the ssm command to get completed")
if err := ssm.WaitForCommandStatus("Success", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil { if err := ssm.WaitForCommandStatus("Success", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return errors.Errorf("fail to send ssm command, err: %v", err) return stacktrace.Propagate(err, "failed to send ssm command")
} }
} }
@ -159,14 +166,14 @@ func AbortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, abort c
case len(experimentsDetails.CommandIDs) != 0: case len(experimentsDetails.CommandIDs) != 0:
for _, commandId := range experimentsDetails.CommandIDs { for _, commandId := range experimentsDetails.CommandIDs {
if err := ssm.CancelCommand(commandId, experimentsDetails.Region); err != nil { if err := ssm.CancelCommand(commandId, experimentsDetails.Region); err != nil {
log.Errorf("[Abort]: fail to cancle command, recovery failed, err: %v", err) log.Errorf("[Abort]: Failed to cancel command, recovery failed: %v", err)
} }
} }
default: default:
log.Info("[Abort]: No command found to cancle") log.Info("[Abort]: No SSM Command found to cancel")
} }
if err := ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region); err != nil { if err := ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region); err != nil {
log.Errorf("fail to delete ssm doc, err: %v", err) log.Errorf("Failed to delete ssm document: %v", err)
} }
log.Info("[Abort]: Chaos Revert Completed") log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1) os.Exit(1)

View File

@ -1,6 +1,8 @@
package ssm package ssm
import ( import (
"context"
"fmt"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
@ -8,12 +10,15 @@ import (
"github.com/litmuschaos/litmus-go/chaoslib/litmus/aws-ssm-chaos/lib" "github.com/litmuschaos/litmus-go/chaoslib/litmus/aws-ssm-chaos/lib"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm" "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
) )
var ( var (
@ -21,8 +26,10 @@ var (
inject, abort chan os.Signal inject, abort chan os.Signal
) )
//PrepareAWSSSMChaosByID contains the prepration and injection steps for the experiment // PrepareAWSSSMChaosByID contains the prepration and injection steps for the experiment
func PrepareAWSSSMChaosByID(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareAWSSSMChaosByID(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSSSMFaultByID")
defer span.End()
// inject channel is used to transmit signal notifications. // inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
@ -42,7 +49,7 @@ func PrepareAWSSSMChaosByID(experimentsDetails *experimentTypes.ExperimentDetail
//create and upload the ssm document on the given aws service monitoring docs //create and upload the ssm document on the given aws service monitoring docs
if err = ssm.CreateAndUploadDocument(experimentsDetails.DocumentName, experimentsDetails.DocumentType, experimentsDetails.DocumentFormat, experimentsDetails.DocumentPath, experimentsDetails.Region); err != nil { if err = ssm.CreateAndUploadDocument(experimentsDetails.DocumentName, experimentsDetails.DocumentType, experimentsDetails.DocumentFormat, experimentsDetails.DocumentPath, experimentsDetails.Region); err != nil {
return errors.Errorf("fail to create and upload ssm doc, err: %v", err) return stacktrace.Propagate(err, "could not create and upload the ssm document")
} }
experimentsDetails.IsDocsUploaded = true experimentsDetails.IsDocsUploaded = true
log.Info("[Info]: SSM docs uploaded successfully") log.Info("[Info]: SSM docs uploaded successfully")
@ -52,27 +59,27 @@ func PrepareAWSSSMChaosByID(experimentsDetails *experimentTypes.ExperimentDetail
//get the instance id or list of instance ids //get the instance id or list of instance ids
instanceIDList := strings.Split(experimentsDetails.EC2InstanceID, ",") instanceIDList := strings.Split(experimentsDetails.EC2InstanceID, ",")
if len(instanceIDList) == 0 { if experimentsDetails.EC2InstanceID == "" || len(instanceIDList) == 0 {
return errors.Errorf("no instance id found for chaos injection") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no instance id found for chaos injection"}
} }
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = lib.InjectChaosInSerialMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil { if err = lib.InjectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = lib.InjectChaosInParallelMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil { if err = lib.InjectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
//Delete the ssm document on the given aws service monitoring docs //Delete the ssm document on the given aws service monitoring docs
err = ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region) err = ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region)
if err != nil { if err != nil {
return errors.Errorf("fail to delete ssm doc, err: %v", err) return stacktrace.Propagate(err, "failed to delete ssm doc")
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection

View File

@ -1,6 +1,8 @@
package ssm package ssm
import ( import (
"context"
"fmt"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
@ -8,16 +10,21 @@ import (
"github.com/litmuschaos/litmus-go/chaoslib/litmus/aws-ssm-chaos/lib" "github.com/litmuschaos/litmus-go/chaoslib/litmus/aws-ssm-chaos/lib"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm" "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
) )
//PrepareAWSSSMChaosByTag contains the prepration and injection steps for the experiment // PrepareAWSSSMChaosByTag contains the prepration and injection steps for the experiment
func PrepareAWSSSMChaosByTag(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareAWSSSMChaosByTag(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSSSMFaultByTag")
defer span.End()
// inject channel is used to transmit signal notifications. // inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
@ -37,7 +44,7 @@ func PrepareAWSSSMChaosByTag(experimentsDetails *experimentTypes.ExperimentDetai
//create and upload the ssm document on the given aws service monitoring docs //create and upload the ssm document on the given aws service monitoring docs
if err = ssm.CreateAndUploadDocument(experimentsDetails.DocumentName, experimentsDetails.DocumentType, experimentsDetails.DocumentFormat, experimentsDetails.DocumentPath, experimentsDetails.Region); err != nil { if err = ssm.CreateAndUploadDocument(experimentsDetails.DocumentName, experimentsDetails.DocumentType, experimentsDetails.DocumentFormat, experimentsDetails.DocumentPath, experimentsDetails.Region); err != nil {
return errors.Errorf("fail to create and upload ssm doc, err: %v", err) return stacktrace.Propagate(err, "could not create and upload the ssm document")
} }
experimentsDetails.IsDocsUploaded = true experimentsDetails.IsDocsUploaded = true
log.Info("[Info]: SSM docs uploaded successfully") log.Info("[Info]: SSM docs uploaded successfully")
@ -48,26 +55,26 @@ func PrepareAWSSSMChaosByTag(experimentsDetails *experimentTypes.ExperimentDetai
log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceIDList)) log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceIDList))
if len(instanceIDList) == 0 { if len(instanceIDList) == 0 {
return errors.Errorf("no instance id found for chaos injection") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no instance id found for chaos injection"}
} }
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = lib.InjectChaosInSerialMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil { if err = lib.InjectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = lib.InjectChaosInParallelMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil { if err = lib.InjectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
//Delete the ssm document on the given aws service monitoring docs //Delete the ssm document on the given aws service monitoring docs
err = ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region) err = ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region)
if err != nil { if err != nil {
return errors.Errorf("fail to delete ssm doc, err: %v", err) return stacktrace.Propagate(err, "failed to delete ssm doc")
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection

View File

@ -1,6 +1,8 @@
package lib package lib
import ( import (
"context"
"fmt"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
@ -9,16 +11,19 @@ import (
"github.com/Azure/azure-sdk-for-go/profiles/latest/compute/mgmt/compute" "github.com/Azure/azure-sdk-for-go/profiles/latest/compute/mgmt/compute"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/azure/disk-loss/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/azure/disk-loss/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
diskStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/disk" diskStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/disk"
instanceStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/instance" instanceStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/instance"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry" "github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors" "github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
) )
var ( var (
@ -26,8 +31,10 @@ var (
inject, abort chan os.Signal inject, abort chan os.Signal
) )
//PrepareChaos contains the prepration and injection steps for the experiment // PrepareChaos contains the prepration and injection steps for the experiment
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAzureDiskLossFault")
defer span.End()
// inject channel is used to transmit signal notifications. // inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
@ -47,13 +54,13 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
//get the disk name or list of disk names //get the disk name or list of disk names
diskNameList := strings.Split(experimentsDetails.VirtualDiskNames, ",") diskNameList := strings.Split(experimentsDetails.VirtualDiskNames, ",")
if len(diskNameList) == 0 { if experimentsDetails.VirtualDiskNames == "" || len(diskNameList) == 0 {
return errors.Errorf("no volume names found to detach") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no volume names found to detach"}
} }
instanceNamesWithDiskNames, err := diskStatus.GetInstanceNameForDisks(diskNameList, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup) instanceNamesWithDiskNames, err := diskStatus.GetInstanceNameForDisks(diskNameList, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup)
if err != nil { if err != nil {
return errors.Errorf("error fetching attached instances for disks, err: %v", err) return stacktrace.Propagate(err, "error fetching attached instances for disks")
} }
// Get the instance name with attached disks // Get the instance name with attached disks
@ -62,7 +69,7 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
for instanceName := range instanceNamesWithDiskNames { for instanceName := range instanceNamesWithDiskNames {
attachedDisksWithInstance[instanceName], err = diskStatus.GetInstanceDiskList(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, experimentsDetails.ScaleSet, instanceName) attachedDisksWithInstance[instanceName], err = diskStatus.GetInstanceDiskList(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, experimentsDetails.ScaleSet, instanceName)
if err != nil { if err != nil {
return errors.Errorf("error fetching virtual disks, err: %v", err) return stacktrace.Propagate(err, "error fetching virtual disks")
} }
} }
@ -77,15 +84,15 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(experimentsDetails, instanceNamesWithDiskNames, attachedDisksWithInstance, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceNamesWithDiskNames, attachedDisksWithInstance, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, instanceNamesWithDiskNames, attachedDisksWithInstance, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceNamesWithDiskNames, attachedDisksWithInstance, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -97,8 +104,10 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
return nil return nil
} }
// injectChaosInParallelMode will inject the azure disk loss chaos in parallel mode that is all at once // injectChaosInParallelMode will inject the Azure disk loss chaos in parallel mode that is all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesWithDiskNames map[string][]string, attachedDisksWithInstance map[string]*[]compute.DataDisk, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesWithDiskNames map[string][]string, attachedDisksWithInstance map[string]*[]compute.DataDisk, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureDiskLossFaultInParallelMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin //ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now() ChaosStartTimeStamp := time.Now()
@ -107,7 +116,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for duration < experimentsDetails.ChaosDuration { for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on azure virtual disk" msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure virtual disk"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails) types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine") events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
} }
@ -116,7 +125,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
log.Info("[Chaos]: Detaching the virtual disks from the instances") log.Info("[Chaos]: Detaching the virtual disks from the instances")
for instanceName, diskNameList := range instanceNamesWithDiskNames { for instanceName, diskNameList := range instanceNamesWithDiskNames {
if err = diskStatus.DetachDisks(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameList); err != nil { if err = diskStatus.DetachDisks(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameList); err != nil {
return errors.Errorf("failed to detach disks, err: %v", err) return stacktrace.Propagate(err, "failed to detach disks")
} }
} }
// Waiting for disk to be detached // Waiting for disk to be detached
@ -124,7 +133,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for _, diskName := range diskNameList { for _, diskName := range diskNameList {
log.Infof("[Wait]: Waiting for Disk '%v' to detach", diskName) log.Infof("[Wait]: Waiting for Disk '%v' to detach", diskName)
if err := diskStatus.WaitForDiskToDetach(experimentsDetails, diskName); err != nil { if err := diskStatus.WaitForDiskToDetach(experimentsDetails, diskName); err != nil {
return errors.Errorf("disk attach check failed, err: %v", err) return stacktrace.Propagate(err, "disk detachment check failed")
} }
} }
} }
@ -137,8 +146,8 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
} }
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "failed to run probes")
} }
} }
@ -150,24 +159,24 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
log.Info("[Chaos]: Attaching the Virtual disks back to the instances") log.Info("[Chaos]: Attaching the Virtual disks back to the instances")
for instanceName, diskNameList := range attachedDisksWithInstance { for instanceName, diskNameList := range attachedDisksWithInstance {
if err = diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameList); err != nil { if err = diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameList); err != nil {
return errors.Errorf("virtual disk attachment failed, err: %v", err) return stacktrace.Propagate(err, "virtual disk attachment failed")
} }
}
// Wait for disk to be attached // Wait for disk to be attached
for _, diskNameList := range instanceNamesWithDiskNames { for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList { for _, diskName := range diskNameList {
log.Infof("[Wait]: Waiting for Disk '%v' to attach", diskName) log.Infof("[Wait]: Waiting for Disk '%v' to attach", diskName)
if err := diskStatus.WaitForDiskToAttach(experimentsDetails, diskName); err != nil { if err := diskStatus.WaitForDiskToAttach(experimentsDetails, diskName); err != nil {
return errors.Errorf("disk attach check failed, err: %v", err) return stacktrace.Propagate(err, "disk attachment check failed")
}
} }
} }
}
// Updating the result details // Updating the result details
for _, diskNameList := range instanceNamesWithDiskNames { for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList { for _, diskName := range diskNameList {
common.SetTargets(diskName, "re-attached", "VirtualDisk", chaosDetails) common.SetTargets(diskName, "re-attached", "VirtualDisk", chaosDetails)
}
} }
} }
duration = int(time.Since(ChaosStartTimeStamp).Seconds()) duration = int(time.Since(ChaosStartTimeStamp).Seconds())
@ -175,8 +184,10 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
return nil return nil
} }
//injectChaosInSerialMode will inject the azure disk loss chaos in serial mode that is one after other // injectChaosInSerialMode will inject the Azure disk loss chaos in serial mode that is one after other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesWithDiskNames map[string][]string, attachedDisksWithInstance map[string]*[]compute.DataDisk, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesWithDiskNames map[string][]string, attachedDisksWithInstance map[string]*[]compute.DataDisk, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureDiskLossFaultInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin //ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now() ChaosStartTimeStamp := time.Now()
@ -185,7 +196,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
for duration < experimentsDetails.ChaosDuration { for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on azure virtual disks" msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure virtual disks"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails) types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine") events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
} }
@ -198,13 +209,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
// Detaching the virtual disks // Detaching the virtual disks
log.Infof("[Chaos]: Detaching %v from the instance", diskName) log.Infof("[Chaos]: Detaching %v from the instance", diskName)
if err = diskStatus.DetachDisks(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameToList); err != nil { if err = diskStatus.DetachDisks(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameToList); err != nil {
return errors.Errorf("failed to detach disks, err: %v", err) return stacktrace.Propagate(err, "failed to detach disks")
} }
// Waiting for disk to be detached // Waiting for disk to be detached
log.Infof("[Wait]: Waiting for Disk '%v' to detach", diskName) log.Infof("[Wait]: Waiting for Disk '%v' to detach", diskName)
if err := diskStatus.WaitForDiskToDetach(experimentsDetails, diskName); err != nil { if err := diskStatus.WaitForDiskToDetach(experimentsDetails, diskName); err != nil {
return errors.Errorf("disk detach check failed, err: %v", err) return stacktrace.Propagate(err, "disk detachment check failed")
} }
common.SetTargets(diskName, "detached", "VirtualDisk", chaosDetails) common.SetTargets(diskName, "detached", "VirtualDisk", chaosDetails)
@ -212,8 +223,8 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
// run the probes during chaos // run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration // the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 { if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "failed to run probes")
} }
} }
@ -224,13 +235,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Attaching the virtual disks to the instance //Attaching the virtual disks to the instance
log.Infof("[Chaos]: Attaching %v back to the instance", diskName) log.Infof("[Chaos]: Attaching %v back to the instance", diskName)
if err = diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, attachedDisksWithInstance[instanceName]); err != nil { if err = diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, attachedDisksWithInstance[instanceName]); err != nil {
return errors.Errorf("disk attachment failed, err: %v", err) return stacktrace.Propagate(err, "disk attachment failed")
} }
// Waiting for disk to be attached // Waiting for disk to be attached
log.Infof("[Wait]: Waiting for Disk '%v' to attach", diskName) log.Infof("[Wait]: Waiting for Disk '%v' to attach", diskName)
if err := diskStatus.WaitForDiskToAttach(experimentsDetails, diskName); err != nil { if err := diskStatus.WaitForDiskToAttach(experimentsDetails, diskName); err != nil {
return errors.Errorf("disk attach check failed, err: %v", err) return stacktrace.Propagate(err, "disk attachment check failed")
} }
common.SetTargets(diskName, "re-attached", "VirtualDisk", chaosDetails) common.SetTargets(diskName, "re-attached", "VirtualDisk", chaosDetails)
@ -257,10 +268,10 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, attache
Try(func(attempt uint) error { Try(func(attempt uint) error {
status, err := instanceStatus.GetAzureInstanceProvisionStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet) status, err := instanceStatus.GetAzureInstanceProvisionStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet)
if err != nil { if err != nil {
return errors.Errorf("Failed to get instance, err: %v", err) return stacktrace.Propagate(err, "failed to get instance")
} }
if status != "Provisioning succeeded" { if status != "Provisioning succeeded" {
return errors.Errorf("instance is updating, waiting for instance to finish update") return stacktrace.Propagate(err, "instance is updating, waiting for instance to finish update")
} }
return nil return nil
}) })
@ -271,11 +282,11 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, attache
for _, disk := range *diskList { for _, disk := range *diskList {
diskStatusString, err := diskStatus.GetDiskStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, *disk.Name) diskStatusString, err := diskStatus.GetDiskStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, *disk.Name)
if err != nil { if err != nil {
log.Errorf("Failed to get disk status, err: %v", err) log.Errorf("Failed to get disk status: %v", err)
} }
if diskStatusString != "Attached" { if diskStatusString != "Attached" {
if err := diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskList); err != nil { if err := diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskList); err != nil {
log.Errorf("failed to attach disk '%v, manual revert required, err: %v", err) log.Errorf("Failed to attach disk, manual revert required: %v", err)
} else { } else {
common.SetTargets(*disk.Name, "re-attached", "VirtualDisk", chaosDetails) common.SetTargets(*disk.Name, "re-attached", "VirtualDisk", chaosDetails)
} }

View File

@ -1,6 +1,8 @@
package lib package lib
import ( import (
"context"
"fmt"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
@ -8,15 +10,18 @@ import (
"time" "time"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/azure/instance-stop/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/azure/instance-stop/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
azureCommon "github.com/litmuschaos/litmus-go/pkg/cloud/azure/common" azureCommon "github.com/litmuschaos/litmus-go/pkg/cloud/azure/common"
azureStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/instance" azureStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/instance"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
) )
var ( var (
@ -25,7 +30,9 @@ var (
) )
// PrepareAzureStop will initialize instanceNameList and start chaos injection based on sequence method selected // PrepareAzureStop will initialize instanceNameList and start chaos injection based on sequence method selected
func PrepareAzureStop(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareAzureStop(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAzureInstanceStopFault")
defer span.End()
// inject channel is used to transmit signal notifications // inject channel is used to transmit signal notifications
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
@ -44,8 +51,8 @@ func PrepareAzureStop(experimentsDetails *experimentTypes.ExperimentDetails, cli
// get the instance name or list of instance names // get the instance name or list of instance names
instanceNameList := strings.Split(experimentsDetails.AzureInstanceNames, ",") instanceNameList := strings.Split(experimentsDetails.AzureInstanceNames, ",")
if len(instanceNameList) == 0 { if experimentsDetails.AzureInstanceNames == "" || len(instanceNameList) == 0 {
return errors.Errorf("no instance name found to stop") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no instance name found to stop"}
} }
// watching for the abort signal and revert the chaos // watching for the abort signal and revert the chaos
@ -53,15 +60,15 @@ func PrepareAzureStop(experimentsDetails *experimentTypes.ExperimentDetails, cli
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(experimentsDetails, instanceNameList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceNameList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, instanceNameList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceNameList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
// Waiting for the ramp time after chaos injection // Waiting for the ramp time after chaos injection
@ -72,8 +79,11 @@ func PrepareAzureStop(experimentsDetails *experimentTypes.ExperimentDetails, cli
return nil return nil
} }
// injectChaosInSerialMode will inject the azure instance termination in serial mode that is one after the other // injectChaosInSerialMode will inject the Azure instance termination in serial mode that is one after the other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureInstanceStopFaultInSerialMode")
defer span.End()
select { select {
case <-inject: case <-inject:
// stopping the chaos execution, if abort signal received // stopping the chaos execution, if abort signal received
@ -88,7 +98,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Infof("[Info]: Target instanceName list, %v", instanceNameList) log.Infof("[Info]: Target instanceName list, %v", instanceNameList)
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on azure instance" msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails) types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine") events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
} }
@ -100,25 +110,25 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Infof("[Chaos]: Stopping the Azure instance: %v", vmName) log.Infof("[Chaos]: Stopping the Azure instance: %v", vmName)
if experimentsDetails.ScaleSet == "enable" { if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil { if err := azureStatus.AzureScaleSetInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to stop the Azure instance, err: %v", err) return stacktrace.Propagate(err, "unable to stop the Azure instance")
} }
} else { } else {
if err := azureStatus.AzureInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil { if err := azureStatus.AzureInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to stop the Azure instance, err: %v", err) return stacktrace.Propagate(err, "unable to stop the Azure instance")
} }
} }
// Wait for Azure instance to completely stop // Wait for Azure instance to completely stop
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the stopped state", vmName) log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the stopped state", vmName)
if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil { if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("instance poweroff status check failed, err: %v", err) return stacktrace.Propagate(err, "instance poweroff status check failed")
} }
// Run the probes during chaos // Run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration // the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 { if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "failed to run probes")
} }
} }
@ -130,18 +140,18 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Info("[Chaos]: Starting back the Azure instance") log.Info("[Chaos]: Starting back the Azure instance")
if experimentsDetails.ScaleSet == "enable" { if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil { if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to start the Azure instance, err: %v", err) return stacktrace.Propagate(err, "unable to start the Azure instance")
} }
} else { } else {
if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil { if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to start the Azure instance, err: %v", err) return stacktrace.Propagate(err, "unable to start the Azure instance")
} }
} }
// Wait for Azure instance to get in running state // Wait for Azure instance to get in running state
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the running state", vmName) log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the running state", vmName)
if err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil { if err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("instance power on status check failed, err: %v", err) return stacktrace.Propagate(err, "instance power on status check failed")
} }
} }
duration = int(time.Since(ChaosStartTimeStamp).Seconds()) duration = int(time.Since(ChaosStartTimeStamp).Seconds())
@ -150,8 +160,11 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
return nil return nil
} }
// injectChaosInParallelMode will inject the azure instance termination in parallel mode that is all at once // injectChaosInParallelMode will inject the Azure instance termination in parallel mode that is all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureInstanceStopFaultInParallelMode")
defer span.End()
select { select {
case <-inject: case <-inject:
// Stopping the chaos execution, if abort signal received // Stopping the chaos execution, if abort signal received
@ -177,11 +190,11 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
log.Infof("[Chaos]: Stopping the Azure instance: %v", vmName) log.Infof("[Chaos]: Stopping the Azure instance: %v", vmName)
if experimentsDetails.ScaleSet == "enable" { if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil { if err := azureStatus.AzureScaleSetInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to stop azure instance, err: %v", err) return stacktrace.Propagate(err, "unable to stop Azure instance")
} }
} else { } else {
if err := azureStatus.AzureInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil { if err := azureStatus.AzureInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to stop azure instance, err: %v", err) return stacktrace.Propagate(err, "unable to stop Azure instance")
} }
} }
} }
@ -190,14 +203,14 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for _, vmName := range instanceNameList { for _, vmName := range instanceNameList {
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the stopped state", vmName) log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the stopped state", vmName)
if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil { if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("instance poweroff status check failed, err: %v", err) return stacktrace.Propagate(err, "instance poweroff status check failed")
} }
} }
// Run probes during chaos // Run probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "failed to run probes")
} }
} }
@ -210,11 +223,11 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
log.Infof("[Chaos]: Starting back the Azure instance: %v", vmName) log.Infof("[Chaos]: Starting back the Azure instance: %v", vmName)
if experimentsDetails.ScaleSet == "enable" { if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil { if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to start the Azure instance, err: %v", err) return stacktrace.Propagate(err, "unable to start the Azure instance")
} }
} else { } else {
if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil { if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to start the Azure instance, err: %v", err) return stacktrace.Propagate(err, "unable to start the Azure instance")
} }
} }
} }
@ -223,7 +236,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for _, vmName := range instanceNameList { for _, vmName := range instanceNameList {
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the running state", vmName) log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the running state", vmName)
if err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil { if err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("instance power on status check failed, err: %v", err) return stacktrace.Propagate(err, "instance power on status check failed")
} }
} }
@ -248,22 +261,22 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanc
instanceState, err = azureStatus.GetAzureInstanceStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName) instanceState, err = azureStatus.GetAzureInstanceStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName)
} }
if err != nil { if err != nil {
log.Errorf("[Abort]: Fail to get instance status when an abort signal is received, err: %v", err) log.Errorf("[Abort]: Failed to get instance status when an abort signal is received: %v", err)
} }
if instanceState != "VM running" && instanceState != "VM starting" { if instanceState != "VM running" && instanceState != "VM starting" {
log.Info("[Abort]: Waiting for the Azure instance to get down") log.Info("[Abort]: Waiting for the Azure instance to get down")
if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil { if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
log.Errorf("[Abort]: Instance power off status check failed, err: %v", err) log.Errorf("[Abort]: Instance power off status check failed: %v", err)
} }
log.Info("[Abort]: Starting Azure instance as abort signal received") log.Info("[Abort]: Starting Azure instance as abort signal received")
if experimentsDetails.ScaleSet == "enable" { if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil { if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
log.Errorf("[Abort]: Unable to start the Azure instance, err: %v", err) log.Errorf("[Abort]: Unable to start the Azure instance: %v", err)
} }
} else { } else {
if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil { if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
log.Errorf("[Abort]: Unable to start the Azure instance, err: %v", err) log.Errorf("[Abort]: Unable to start the Azure instance: %v", err)
} }
} }
} }
@ -271,7 +284,7 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanc
log.Info("[Abort]: Waiting for the Azure instance to start") log.Info("[Abort]: Waiting for the Azure instance to start")
err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName) err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName)
if err != nil { if err != nil {
log.Errorf("[Abort]: Instance power on status check failed, err: %v", err) log.Errorf("[Abort]: Instance power on status check failed: %v", err)
log.Errorf("[Abort]: Azure instance %v failed to start after an abort signal is received", vmName) log.Errorf("[Abort]: Azure instance %v failed to start after an abort signal is received", vmName)
} }
} }

View File

@ -1,28 +1,38 @@
package helper package helper
import ( import (
"bytes"
"context" "context"
"fmt"
"os/exec" "os/exec"
"strconv" "strconv"
"strings"
"time" "time"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/palantir/stacktrace"
"github.com/sirupsen/logrus"
"github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry" "github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientTypes "k8s.io/apimachinery/pkg/types" clientTypes "k8s.io/apimachinery/pkg/types"
) )
var err error
// Helper injects the container-kill chaos // Helper injects the container-kill chaos
func Helper(clients clients.ClientSets) { func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulateContainerKillFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{} experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{} eventsDetails := types.EventDetails{}
@ -33,14 +43,18 @@ func Helper(clients clients.ClientSets) {
log.Info("[PreReq]: Getting the ENV variables") log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails) getENV(&experimentsDetails)
// Intialise the chaos attributes // Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails) types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters // Initialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails) types.SetResultAttributes(&resultDetails, chaosDetails)
err := killContainer(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails) if err := killContainer(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
if err != nil { // update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err) log.Fatalf("helper pod failed, err: %v", err)
} }
} }
@ -49,6 +63,33 @@ func Helper(clients clients.ClientSets) {
// it will kill the container till the chaos duration // it will kill the container till the chaos duration
// the execution will stop after timestamp passes the given chaos duration // the execution will stop after timestamp passes the given chaos duration
func killContainer(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error { func killContainer(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not parse targets")
}
var targets []targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
targets = append(targets, td)
log.Infof("Injecting chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
}
if err := killIterations(targets, experimentsDetails, clients, eventsDetails, chaosDetails, resultDetails); err != nil {
return err
}
log.Infof("[Completion]: %v chaos has been completed", experimentsDetails.ExperimentName)
return nil
}
func killIterations(targets []targetDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin //ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now() ChaosStartTimeStamp := time.Now()
@ -56,43 +97,30 @@ func killContainer(experimentsDetails *experimentTypes.ExperimentDetails, client
for duration < experimentsDetails.ChaosDuration { for duration < experimentsDetails.ChaosDuration {
//getRestartCount return the restart count of target container var containerIds []string
restartCountBefore, err := getRestartCount(experimentsDetails, experimentsDetails.TargetPods, clients)
if err != nil {
return err
}
//Obtain the container ID through Pod for _, t := range targets {
// this id will be used to select the container for the kill t.RestartCountBefore, err = getRestartCount(t, clients)
containerID, err := common.GetContainerID(experimentsDetails.AppNS, experimentsDetails.TargetPods, experimentsDetails.TargetContainer, clients) if err != nil {
if err != nil { return stacktrace.Propagate(err, "could get container restart count")
return errors.Errorf("Unable to get the container id, %v", err)
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": experimentsDetails.TargetPods,
"ContainerName": experimentsDetails.TargetContainer,
"RestartCountBefore": restartCountBefore,
})
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
switch experimentsDetails.ContainerRuntime {
case "docker":
if err := stopDockerContainer(containerID, experimentsDetails.SocketPath, experimentsDetails.Signal); err != nil {
return err
} }
case "containerd", "crio":
if err := stopContainerdContainer(containerID, experimentsDetails.SocketPath, experimentsDetails.Signal); err != nil { containerId, err := common.GetContainerID(t.Namespace, t.Name, t.TargetContainer, clients, t.Source)
return err if err != nil {
return stacktrace.Propagate(err, "could not get container id")
} }
default:
return errors.Errorf("%v container runtime not supported", experimentsDetails.ContainerRuntime) log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": t.Name,
"ContainerName": t.TargetContainer,
"RestartCountBefore": t.RestartCountBefore,
})
containerIds = append(containerIds, containerId)
}
if err := kill(experimentsDetails, containerIds, clients, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not kill target container")
} }
//Waiting for the chaos interval after chaos injection //Waiting for the chaos interval after chaos injection
@ -101,67 +129,93 @@ func killContainer(experimentsDetails *experimentTypes.ExperimentDetails, client
common.WaitForDuration(experimentsDetails.ChaosInterval) common.WaitForDuration(experimentsDetails.ChaosInterval)
} }
//Check the status of restarted container for _, t := range targets {
err = common.CheckContainerStatus(experimentsDetails.AppNS, experimentsDetails.TargetPods, experimentsDetails.Timeout, experimentsDetails.Delay, clients) if err := validate(t, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
if err != nil { return stacktrace.Propagate(err, "could not verify restart count")
return errors.Errorf("application container is not in running state, %v", err) }
if err := result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "targeted", "pod", t.Name); err != nil {
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
} }
// It will verify that the restart count of container should increase after chaos injection
err = verifyRestartCount(experimentsDetails, experimentsDetails.TargetPods, clients, restartCountBefore)
if err != nil {
return err
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds()) duration = int(time.Since(ChaosStartTimeStamp).Seconds())
} }
if err := result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "targeted", "pod", experimentsDetails.TargetPods); err != nil { return nil
}
func kill(experimentsDetails *experimentTypes.ExperimentDetails, containerIds []string, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
switch experimentsDetails.ContainerRuntime {
case "docker":
if err := stopDockerContainer(containerIds, experimentsDetails.SocketPath, experimentsDetails.Signal, experimentsDetails.ChaosPodName); err != nil {
if isContextDeadlineExceeded(err) {
return nil
}
return stacktrace.Propagate(err, "could not stop container")
}
case "containerd", "crio":
if err := stopContainerdContainer(containerIds, experimentsDetails.SocketPath, experimentsDetails.Signal, experimentsDetails.ChaosPodName, experimentsDetails.Timeout); err != nil {
if isContextDeadlineExceeded(err) {
return nil
}
return stacktrace.Propagate(err, "could not stop container")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: fmt.Sprintf("unsupported container runtime %s", experimentsDetails.ContainerRuntime)}
}
return nil
}
func validate(t targetDetails, timeout, delay int, clients clients.ClientSets) error {
//Check the status of restarted container
if err := common.CheckContainerStatus(t.Namespace, t.Name, timeout, delay, clients, t.Source); err != nil {
return err return err
} }
log.Infof("[Completion]: %v chaos has been completed", experimentsDetails.ExperimentName)
return nil // It will verify that the restart count of container should increase after chaos injection
return verifyRestartCount(t, timeout, delay, clients, t.RestartCountBefore)
} }
//stopContainerdContainer kill the application container // stopContainerdContainer kill the application container
func stopContainerdContainer(containerID, socketPath, signal string) error { func stopContainerdContainer(containerIDs []string, socketPath, signal, source string, timeout int) error {
var errOut bytes.Buffer if signal != "SIGKILL" && signal != "SIGTERM" {
var cmd *exec.Cmd return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: source, Reason: fmt.Sprintf("unsupported signal %s, use either SIGTERM or SIGKILL", signal)}
endpoint := "unix://" + socketPath
switch signal {
case "SIGKILL":
cmd = exec.Command("sudo", "crictl", "-i", endpoint, "-r", endpoint, "stop", "--timeout=0", string(containerID))
case "SIGTERM":
cmd = exec.Command("sudo", "crictl", "-i", endpoint, "-r", endpoint, "stop", string(containerID))
default:
return errors.Errorf("{%v} signal not supported, use either SIGTERM or SIGKILL", signal)
} }
cmd.Stderr = &errOut
if err := cmd.Run(); err != nil { cmd := exec.Command("sudo", "crictl", "-i", fmt.Sprintf("unix://%s", socketPath), "-r", fmt.Sprintf("unix://%s", socketPath), "stop")
return errors.Errorf("Unable to run command, err: %v; error output: %v", err, errOut.String()) if signal == "SIGKILL" {
cmd.Args = append(cmd.Args, "--timeout=0")
} else if timeout != -1 {
cmd.Args = append(cmd.Args, fmt.Sprintf("--timeout=%v", timeout))
} }
return nil cmd.Args = append(cmd.Args, containerIDs...)
return common.RunCLICommands(cmd, source, "", "failed to stop container", cerrors.ErrorTypeChaosInject)
} }
//stopDockerContainer kill the application container // stopDockerContainer kill the application container
func stopDockerContainer(containerID, socketPath, signal string) error { func stopDockerContainer(containerIDs []string, socketPath, signal, source string) error {
var errOut bytes.Buffer cmd := exec.Command("sudo", "docker", "--host", fmt.Sprintf("unix://%s", socketPath), "kill", "--signal", signal)
host := "unix://" + socketPath cmd.Args = append(cmd.Args, containerIDs...)
cmd := exec.Command("sudo", "docker", "--host", host, "kill", string(containerID), "--signal", signal) return common.RunCLICommands(cmd, source, "", "failed to stop container", cerrors.ErrorTypeChaosInject)
cmd.Stderr = &errOut
if err := cmd.Run(); err != nil {
return errors.Errorf("Unable to run command, err: %v; error output: %v", err, errOut.String())
}
return nil
} }
//getRestartCount return the restart count of target container // getRestartCount return the restart count of target container
func getRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets) (int, error) { func getRestartCount(target targetDetails, clients clients.ClientSets) (int, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(context.Background(), podName, v1.GetOptions{}) pod, err := clients.GetPod(target.Namespace, target.Name, 180, 2)
if err != nil { if err != nil {
return 0, err return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: target.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", target.Name, target.Namespace), Reason: err.Error()}
} }
restartCount := 0 restartCount := 0
for _, container := range pod.Status.ContainerStatuses { for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentsDetails.TargetContainer { if container.Name == target.TargetContainer {
restartCount = int(container.RestartCount) restartCount = int(container.RestartCount)
break break
} }
@ -169,39 +223,36 @@ func getRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, podN
return restartCount, nil return restartCount, nil
} }
//verifyRestartCount verify the restart count of target container that it is restarted or not after chaos injection // verifyRestartCount verify the restart count of target container that it is restarted or not after chaos injection
func verifyRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets, restartCountBefore int) error { func verifyRestartCount(t targetDetails, timeout, delay int, clients clients.ClientSets, restartCountBefore int) error {
restartCountAfter := 0 restartCountAfter := 0
return retry. return retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)). Times(uint(timeout / delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second). Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error { Try(func(attempt uint) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(context.Background(), podName, v1.GetOptions{}) pod, err := clients.KubeClient.CoreV1().Pods(t.Namespace).Get(context.Background(), t.Name, v1.GetOptions{})
if err != nil { if err != nil {
return errors.Errorf("Unable to find the pod with name %v, err: %v", podName, err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
} }
for _, container := range pod.Status.ContainerStatuses { for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentsDetails.TargetContainer { if container.Name == t.TargetContainer {
restartCountAfter = int(container.RestartCount) restartCountAfter = int(container.RestartCount)
break break
} }
} }
if restartCountAfter <= restartCountBefore { if restartCountAfter <= restartCountBefore {
return errors.Errorf("Target container is not restarted") return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: "target container is not restarted after kill"}
} }
log.Infof("restartCount of target container after chaos injection: %v", strconv.Itoa(restartCountAfter)) log.Infof("restartCount of target container after chaos injection: %v", strconv.Itoa(restartCountAfter))
return nil return nil
}) })
} }
//getENV fetches all the env variables from the runner pod // getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) { func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "") experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "") experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30")) experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10")) experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus") experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
@ -213,4 +264,17 @@ func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.Signal = types.Getenv("SIGNAL", "SIGKILL") experimentDetails.Signal = types.Getenv("SIGNAL", "SIGKILL")
experimentDetails.Delay, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_DELAY", "2")) experimentDetails.Delay, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_DELAY", "2"))
experimentDetails.Timeout, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_TIMEOUT", "180")) experimentDetails.Timeout, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_TIMEOUT", "180"))
experimentDetails.ContainerAPITimeout, _ = strconv.Atoi(types.Getenv("CONTAINER_API_TIMEOUT", "-1"))
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
RestartCountBefore int
Source string
}
func isContextDeadlineExceeded(err error) bool {
return strings.Contains(err.Error(), "context deadline exceeded")
} }

View File

@ -2,34 +2,40 @@ package lib
import ( import (
"context" "context"
"fmt"
"os"
"strconv" "strconv"
"strings" "strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1" apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
) )
//PrepareContainerKill contains the prepration steps before chaos injection // PrepareContainerKill contains the preparation steps before chaos injection
func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareContainerKill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareContainerKillFault")
defer span.End()
targetPodList := apiv1.PodList{}
var err error var err error
var podsAffectedPerc int
// Get the target pod details for the chaos execution // Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage // if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" { if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
} }
//Setup the tunables if provided in range //Set up the tunables if provided in range
SetChaosTunables(experimentsDetails) SetChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The tunables are:", logrus.Fields{ log.InfoWithValues("[Info]: The tunables are:", logrus.Fields{
@ -37,33 +43,11 @@ func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails,
"Sequence": experimentsDetails.Sequence, "Sequence": experimentsDetails.Sequence,
}) })
podsAffectedPerc, _ = strconv.Atoi(experimentsDetails.PodsAffectedPerc) targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if experimentsDetails.NodeLabel == "" { if err != nil {
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails) return stacktrace.Propagate(err, "could not get target pods")
if err != nil {
return err
}
} else {
if experimentsDetails.TargetPods == "" {
targetPodList, err = common.GetPodListFromSpecifiedNodes(experimentsDetails.TargetPods, podsAffectedPerc, experimentsDetails.NodeLabel, clients, chaosDetails)
if err != nil {
return err
}
} else {
log.Infof("TARGET_PODS env is provided, overriding the NODE_LABEL input")
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
}
} }
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection //Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime) log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
@ -74,28 +58,28 @@ func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails,
if experimentsDetails.ChaosServiceAccount == "" { if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients) experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil { if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err) return stacktrace.Propagate(err, "could not get experiment service account")
} }
} }
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil { if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err return stacktrace.Propagate(err, "could not set helper data")
} }
} }
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "") experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil { if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil { if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -107,13 +91,12 @@ func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails,
} }
// injectChaosInSerialMode kill the container of all target application serially (one by one) // injectChaosInSerialMode kill the container of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectContainerKillFaultInSerialMode")
labelSuffix := common.GetRunID() defer span.End()
var err error
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -123,112 +106,62 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get the target container name of the application pod //Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided { if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{ runID := stringutils.GetRunID()
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName, if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
"ContainerName": experimentsDetails.TargetContainer, return stacktrace.Propagate(err, "could not create helper pod")
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
} }
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
log.Info("[Status]: Checking the status of the helper pods") return err
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pods")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
} }
} }
return nil return nil
} }
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once) // injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectContainerKillFaultInParallelMode")
labelSuffix := common.GetRunID() defer span.End()
var err error
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
// creating the helper pod to perform container kill chaos runID := stringutils.GetRunID()
for _, pod := range targetPodList.Items { targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
//Get the target container name of the application pod for node, tar := range targets {
if !experimentsDetails.IsTargetContainerProvided { var targetsPerNode []string
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) for _, k := range tar.Target {
if err != nil { targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{ if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
"PodName": pod.Name, return stacktrace.Propagate(err, "could not create helper pod")
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
} }
} }
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
log.Info("[Status]: Checking the status of the helper pods") return err
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pods")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
} }
return nil return nil
} }
// createHelperPod derive the attributes for helper pod and create the helper pod // createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, podName, nodeName, runID, labelSuffix string) error { func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateContainerKillFaultHelperPod")
defer span.End()
privilegedEnable := false privilegedEnable := false
if experimentsDetails.ContainerRuntime == "crio" { if experimentsDetails.ContainerRuntime == "crio" {
@ -238,10 +171,10 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
helperPod := &apiv1.Pod{ helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{ ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID, GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace, Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName), Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations, Annotations: chaosDetails.Annotations,
}, },
Spec: apiv1.PodSpec{ Spec: apiv1.PodSpec{
ServiceAccountName: experimentsDetails.ChaosServiceAccount, ServiceAccountName: experimentsDetails.ChaosServiceAccount,
@ -272,7 +205,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name container-kill", "./helpers -name container-kill",
}, },
Resources: chaosDetails.Resources, Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, podName), Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{ VolumeMounts: []apiv1.VolumeMount{
{ {
Name: "cri-socket", Name: "cri-socket",
@ -287,17 +220,23 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
}, },
} }
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{}) if len(chaosDetails.SideCar) != 0 {
return err helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
} }
// getPodEnv derive all the env required for the helper pod // getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName string) []apiv1.EnvVar { func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS). envDetails.SetEnv("TARGETS", targets).
SetEnv("APP_POD", podName).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)). SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace). SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName). SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
@ -309,14 +248,17 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName st
SetEnv("STATUS_CHECK_DELAY", strconv.Itoa(experimentsDetails.Delay)). SetEnv("STATUS_CHECK_DELAY", strconv.Itoa(experimentsDetails.Delay)).
SetEnv("STATUS_CHECK_TIMEOUT", strconv.Itoa(experimentsDetails.Timeout)). SetEnv("STATUS_CHECK_TIMEOUT", strconv.Itoa(experimentsDetails.Timeout)).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName). SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("CONTAINER_API_TIMEOUT", strconv.Itoa(experimentsDetails.ContainerAPITimeout)).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID). SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name") SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV return envDetails.ENV
} }
//SetChaosTunables will setup a random value within a given range of values // SetChaosTunables will setup a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value. // If the value is not provided in range it'll setup the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) { func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc) experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence) experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)

View File

@ -11,6 +11,11 @@ import (
"syscall" "syscall"
"time" "time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/disk-fill/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/disk-fill/types"
@ -18,7 +23,6 @@ import (
"github.com/litmuschaos/litmus-go/pkg/result" "github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
"k8s.io/apimachinery/pkg/api/resource" "k8s.io/apimachinery/pkg/api/resource"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
@ -28,7 +32,9 @@ import (
var inject, abort chan os.Signal var inject, abort chan os.Signal
// Helper injects the disk-fill chaos // Helper injects the disk-fill chaos
func Helper(clients clients.ClientSets) { func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulateDiskFillFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{} experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{} eventsDetails := types.EventDetails{}
@ -51,6 +57,7 @@ func Helper(clients clients.ClientSets) {
// Intialise the chaos attributes // Intialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails) types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters // Intialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails) types.SetResultAttributes(&resultDetails, chaosDetails)
@ -59,57 +66,58 @@ func Helper(clients clients.ClientSets) {
result.SetResultUID(&resultDetails, clients, &chaosDetails) result.SetResultUID(&resultDetails, clients, &chaosDetails)
if err := diskFill(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil { if err := diskFill(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err) log.Fatalf("helper pod failed, err: %v", err)
} }
} }
//diskFill contains steps to inject disk-fill chaos // diskFill contains steps to inject disk-fill chaos
func diskFill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error { func diskFill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
// Derive the container id of the target container targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
containerID, err := common.GetContainerID(experimentsDetails.AppNS, experimentsDetails.TargetPods, experimentsDetails.TargetContainer, clients)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not parse targets")
} }
// derive the used ephemeral storage size from the target container var targets []targetDetails
du := fmt.Sprintf("sudo du /diskfill/%v", containerID)
cmd := exec.Command("/bin/bash", "-c", du) for _, t := range targetList.Target {
out, err := cmd.CombinedOutput() td := targetDetails{
if err != nil { Name: t.Name,
log.Error(string(out)) Namespace: t.Namespace,
return err TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
// Derive the container id of the target container
td.ContainerId, err = common.GetContainerID(td.Namespace, td.Name, td.TargetContainer, clients, chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the pid of the target container
td.TargetPID, err = common.GetPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return err
}
td.SizeToFill, err = getDiskSizeToFill(td, experimentsDetails, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get disk size to fill")
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": td.Name,
"Namespace": td.Namespace,
"SizeToFill(KB)": td.SizeToFill,
"TargetContainer": td.TargetContainer,
})
targets = append(targets, td)
} }
ephemeralStorageDetails := string(out)
// filtering out the used ephemeral storage from the output of du command
usedEphemeralStorageSize, err := filterUsedEphemeralStorage(ephemeralStorageDetails)
if err != nil {
return errors.Errorf("unable to filter used ephemeral storage size, err: %v", err)
}
log.Infof("used ephemeral storage space: %vKB", strconv.Itoa(usedEphemeralStorageSize))
// GetEphemeralStorageAttributes derive the ephemeral storage attributes from the target container
ephemeralStorageLimit, err := getEphemeralStorageAttributes(experimentsDetails, clients)
if err != nil {
return err
}
if ephemeralStorageLimit == 0 && experimentsDetails.EphemeralStorageMebibytes == "0" {
return errors.Errorf("either provide ephemeral storage limit inside target container or define EPHEMERAL_STORAGE_MEBIBYTES ENV")
}
// deriving the ephemeral storage size to be filled
sizeTobeFilled := getSizeToBeFilled(experimentsDetails, usedEphemeralStorageSize, int(ephemeralStorageLimit))
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": experimentsDetails.TargetPods,
"ContainerName": experimentsDetails.TargetContainer,
"ephemeralStorageLimit(KB)": ephemeralStorageLimit,
"ContainerID": containerID,
})
log.Infof("ephemeral storage size to be filled: %vKB", strconv.Itoa(sizeTobeFilled))
// record the event inside chaosengine // record the event inside chaosengine
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
@ -119,65 +127,80 @@ func diskFill(experimentsDetails *experimentTypes.ExperimentDetails, clients cli
} }
// watching for the abort signal and revert the chaos // watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, clients, containerID, resultDetails.Name) go abortWatcher(targets, experimentsDetails, clients, resultDetails.Name)
if sizeTobeFilled > 0 {
if err := fillDisk(containerID, sizeTobeFilled, experimentsDetails.DataBlockSize); err != nil {
log.Error(string(out))
return err
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Stopping the experiment")
// It will delete the target pod if target pod is evicted
// if target pod is still running then it will delete all the files, which was created earlier during chaos execution
err = remedy(experimentsDetails, clients, containerID)
if err != nil {
return errors.Errorf("unable to perform remedy operation, err: %v", err)
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
} else {
log.Warn("No required free space found!, It's Housefull")
}
return nil
}
// fillDisk fill the ephemeral disk by creating files
func fillDisk(containerID string, sizeTobeFilled, bs int) error {
select { select {
case <-inject: case <-inject:
// stopping the chaos execution, if abort signal received // stopping the chaos execution, if abort signal received
os.Exit(1) os.Exit(1)
default: default:
// Creating files to fill the required ephemeral storage size of block size of 4K }
log.Infof("[Fill]: Filling ephemeral storage, size: %vKB", sizeTobeFilled)
dd := fmt.Sprintf("sudo dd if=/dev/urandom of=/diskfill/%v/diskfill bs=%vK count=%v", containerID, bs, strconv.Itoa(sizeTobeFilled/bs)) for _, t := range targets {
log.Infof("dd: {%v}", dd) if t.SizeToFill > 0 {
cmd := exec.Command("/bin/bash", "-c", dd) if err := fillDisk(t, experimentsDetails.DataBlockSize); err != nil {
_, err := cmd.CombinedOutput() return stacktrace.Propagate(err, "could not fill ephemeral storage")
return err }
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertDiskFill(t, clients); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
} else {
log.Warn("No required free space found!")
}
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Stopping the experiment")
var errList []string
for _, t := range targets {
// It will delete the target pod if target pod is evicted
// if target pod is still running then it will delete all the files, which was created earlier during chaos execution
if err = revertDiskFill(t, clients); err != nil {
errList = append(errList, err.Error())
continue
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// fillDisk fill the ephemeral disk by creating files
func fillDisk(t targetDetails, bs int) error {
// Creating files to fill the required ephemeral storage size of block size of 4K
log.Infof("[Fill]: Filling ephemeral storage, size: %vKB", t.SizeToFill)
dd := fmt.Sprintf("sudo dd if=/dev/urandom of=/proc/%v/root/home/diskfill bs=%vK count=%v", t.TargetPID, bs, strconv.Itoa(t.SizeToFill/bs))
log.Infof("dd: {%v}", dd)
cmd := exec.Command("/bin/bash", "-c", dd)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(err.Error())
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: string(out)}
} }
return nil return nil
} }
// getEphemeralStorageAttributes derive the ephemeral storage attributes from the target pod // getEphemeralStorageAttributes derive the ephemeral storage attributes from the target pod
func getEphemeralStorageAttributes(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (int64, error) { func getEphemeralStorageAttributes(t targetDetails, clients clients.ClientSets) (int64, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(context.Background(), experimentsDetails.TargetPods, v1.GetOptions{}) pod, err := clients.GetPod(t.Namespace, t.Name, 180, 2)
if err != nil { if err != nil {
return 0, err return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
} }
var ephemeralStorageLimit int64 var ephemeralStorageLimit int64
@ -186,7 +209,7 @@ func getEphemeralStorageAttributes(experimentsDetails *experimentTypes.Experimen
// Extracting ephemeral storage limit & requested value from the target container // Extracting ephemeral storage limit & requested value from the target container
// It will be in the form of Kb // It will be in the form of Kb
for _, container := range containers { for _, container := range containers {
if container.Name == experimentsDetails.TargetContainer { if container.Name == t.TargetContainer {
ephemeralStorageLimit = container.Resources.Limits.StorageEphemeral().ToDec().ScaledValue(resource.Kilo) ephemeralStorageLimit = container.Resources.Limits.StorageEphemeral().ToDec().ScaledValue(resource.Kilo)
break break
} }
@ -203,7 +226,7 @@ func filterUsedEphemeralStorage(ephemeralStorageDetails string) (int, error) {
ephemeralStorageAll := strings.Split(ephemeralStorageDetails, "\n") ephemeralStorageAll := strings.Split(ephemeralStorageDetails, "\n")
// It will return the details of main directory // It will return the details of main directory
ephemeralStorageAllDiskFill := strings.Split(ephemeralStorageAll[len(ephemeralStorageAll)-2], "\t")[0] ephemeralStorageAllDiskFill := strings.Split(ephemeralStorageAll[len(ephemeralStorageAll)-2], "\t")[0]
// type casting string to interger // type casting string to integer
ephemeralStorageSize, err := strconv.Atoi(ephemeralStorageAllDiskFill) ephemeralStorageSize, err := strconv.Atoi(ephemeralStorageAllDiskFill)
return ephemeralStorageSize, err return ephemeralStorageSize, err
} }
@ -226,40 +249,38 @@ func getSizeToBeFilled(experimentsDetails *experimentTypes.ExperimentDetails, us
return needToBeFilled return needToBeFilled
} }
// remedy will delete the target pod if target pod is evicted // revertDiskFill will delete the target pod if target pod is evicted
// if target pod is still running then it will delete the files, which was created during chaos execution // if target pod is still running then it will delete the files, which was created during chaos execution
func remedy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, containerID string) error { func revertDiskFill(t targetDetails, clients clients.ClientSets) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(context.Background(), experimentsDetails.TargetPods, v1.GetOptions{}) pod, err := clients.GetPod(t.Namespace, t.Name, 180, 2)
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s,namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
} }
// Deleting the pod as pod is already evicted
podReason := pod.Status.Reason podReason := pod.Status.Reason
if podReason == "Evicted" { if podReason == "Evicted" {
// Deleting the pod as pod is already evicted
log.Warn("Target pod is evicted, deleting the pod") log.Warn("Target pod is evicted, deleting the pod")
if err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(context.Background(), experimentsDetails.TargetPods, v1.DeleteOptions{}); err != nil { if err := clients.KubeClient.CoreV1().Pods(t.Namespace).Delete(context.Background(), t.Name, v1.DeleteOptions{}); err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s,namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("failed to delete target pod after eviction :%s", err.Error())}
} }
} else { } else {
// deleting the files after chaos execution // deleting the files after chaos execution
rm := fmt.Sprintf("sudo rm -rf /diskfill/%v/diskfill", containerID) rm := fmt.Sprintf("sudo rm -rf /proc/%v/root/home/diskfill", t.TargetPID)
cmd := exec.Command("/bin/bash", "-c", rm) cmd := exec.Command("/bin/bash", "-c", rm)
out, err := cmd.CombinedOutput() out, err := cmd.CombinedOutput()
if err != nil { if err != nil {
log.Error(string(out)) log.Error(err.Error())
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s,namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("failed to cleanup ephemeral storage: %s", string(out))}
} }
} }
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
return nil return nil
} }
//getENV fetches all the env variables from the runner pod // getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) { func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "") experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "") experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30")) experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus") experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "") experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
@ -268,10 +289,12 @@ func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.FillPercentage = types.Getenv("FILL_PERCENTAGE", "") experimentDetails.FillPercentage = types.Getenv("FILL_PERCENTAGE", "")
experimentDetails.EphemeralStorageMebibytes = types.Getenv("EPHEMERAL_STORAGE_MEBIBYTES", "") experimentDetails.EphemeralStorageMebibytes = types.Getenv("EPHEMERAL_STORAGE_MEBIBYTES", "")
experimentDetails.DataBlockSize, _ = strconv.Atoi(types.Getenv("DATA_BLOCK_SIZE", "256")) experimentDetails.DataBlockSize, _ = strconv.Atoi(types.Getenv("DATA_BLOCK_SIZE", "256"))
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
} }
// abortWatcher continuously watch for the abort signals // abortWatcher continuously watch for the abort signals
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, containerID, resultName string) { func abortWatcher(targets []targetDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultName string) {
// waiting till the abort signal received // waiting till the abort signal received
<-abort <-abort
@ -280,15 +303,72 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients
// retry thrice for the chaos revert // retry thrice for the chaos revert
retry := 3 retry := 3
for retry > 0 { for retry > 0 {
if err := remedy(experimentsDetails, clients, containerID); err != nil { for _, t := range targets {
log.Errorf("unable to perform remedy operation, err: %v", err) err := revertDiskFill(t, clients)
if err != nil {
log.Errorf("unable to kill disk-fill process, err :%v", err)
continue
}
if err = result.AnnotateChaosResult(resultName, experimentsDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
} }
retry-- retry--
time.Sleep(1 * time.Second) time.Sleep(1 * time.Second)
} }
if err := result.AnnotateChaosResult(resultName, experimentsDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
log.Info("Chaos Revert Completed") log.Info("Chaos Revert Completed")
os.Exit(1) os.Exit(1)
} }
func getDiskSizeToFill(t targetDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (int, error) {
usedEphemeralStorageSize, err := getUsedEphemeralStorage(t)
if err != nil {
return 0, stacktrace.Propagate(err, "could not get used ephemeral storage")
}
// GetEphemeralStorageAttributes derive the ephemeral storage attributes from the target container
ephemeralStorageLimit, err := getEphemeralStorageAttributes(t, clients)
if err != nil {
return 0, stacktrace.Propagate(err, "could not get ephemeral storage attributes")
}
if ephemeralStorageLimit == 0 && experimentsDetails.EphemeralStorageMebibytes == "0" {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: "either provide ephemeral storage limit inside target container or define EPHEMERAL_STORAGE_MEBIBYTES ENV"}
}
// deriving the ephemeral storage size to be filled
sizeTobeFilled := getSizeToBeFilled(experimentsDetails, usedEphemeralStorageSize, int(ephemeralStorageLimit))
return sizeTobeFilled, nil
}
func getUsedEphemeralStorage(t targetDetails) (int, error) {
// derive the used ephemeral storage size from the target container
du := fmt.Sprintf("sudo du /proc/%v/root", t.TargetPID)
cmd := exec.Command("/bin/bash", "-c", du)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(err.Error())
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("failed to get used ephemeral storage size: %s", string(out))}
}
ephemeralStorageDetails := string(out)
// filtering out the used ephemeral storage from the output of du command
usedEphemeralStorageSize, err := filterUsedEphemeralStorage(ephemeralStorageDetails)
if err != nil {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("failed to get used ephemeral storage size: %s", err.Error())}
}
log.Infof("used ephemeral storage space: %vKB", strconv.Itoa(usedEphemeralStorageSize))
return usedEphemeralStorageSize, nil
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
ContainerId string
SizeToFill int
TargetPID int
Source string
}

View File

@ -2,37 +2,43 @@ package lib
import ( import (
"context" "context"
"fmt"
"os"
"strconv" "strconv"
"strings" "strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/disk-fill/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/disk-fill/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/exec" "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors" "github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1" apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
) )
//PrepareDiskFill contains the prepration steps before chaos injection // PrepareDiskFill contains the preparation steps before chaos injection
func PrepareDiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareDiskFill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareDiskFillFault")
defer span.End()
targetPodList := apiv1.PodList{}
var err error var err error
var podsAffectedPerc int // It will contain all the pod & container details required for exec command
// It will contains all the pod & container details required for exec command
execCommandDetails := exec.PodDetails{} execCommandDetails := exec.PodDetails{}
// Get the target pod details for the chaos execution // Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage // if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" { if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
} }
//setup the tunables if provided in range //set up the tunables if provided in range
setChaosTunables(experimentsDetails) setChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{ log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
@ -42,33 +48,11 @@ func PrepareDiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clie
"Sequence": experimentsDetails.Sequence, "Sequence": experimentsDetails.Sequence,
}) })
podsAffectedPerc, _ = strconv.Atoi(experimentsDetails.PodsAffectedPerc) targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if experimentsDetails.NodeLabel == "" { if err != nil {
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails) return stacktrace.Propagate(err, "could not get target pods")
if err != nil {
return err
}
} else {
if experimentsDetails.TargetPods == "" {
targetPodList, err = common.GetPodListFromSpecifiedNodes(experimentsDetails.TargetPods, podsAffectedPerc, experimentsDetails.NodeLabel, clients, chaosDetails)
if err != nil {
return err
}
} else {
log.Infof("TARGET_PODS env is provided, overriding the NODE_LABEL input")
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
}
} }
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection //Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime) log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
@ -79,28 +63,28 @@ func PrepareDiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clie
if experimentsDetails.ChaosServiceAccount == "" { if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients) experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil { if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err) return stacktrace.Propagate(err, "could not get experiment service account")
} }
} }
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil { if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err return stacktrace.Propagate(err, "could not set helper data")
} }
} }
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "") experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails, resultDetails, eventsDetails); err != nil { if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails, resultDetails, eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails, resultDetails, eventsDetails); err != nil { if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails, resultDetails, eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -112,13 +96,12 @@ func PrepareDiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clie
} }
// injectChaosInSerialMode fill the ephemeral storage of all target application serially (one by one) // injectChaosInSerialMode fill the ephemeral storage of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectDiskFillFaultInSerialMode")
labelSuffix := common.GetRunID() defer span.End()
var err error
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -128,39 +111,18 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get the target container name of the application pod //Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided { if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
runID := common.GetRunID() runID := stringutils.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil { if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err) return stacktrace.Propagate(err, "could not create helper pod")
} }
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
log.Info("[Status]: Checking the status of the helper pods") return err
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for disk-fill chaos
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, %v", err)
} }
} }
@ -169,86 +131,69 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
// injectChaosInParallelMode fill the ephemeral storage of of all target application in parallel mode (all at once) // injectChaosInParallelMode fill the ephemeral storage of of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectDiskFillFaultInParallelMode")
defer span.End()
labelSuffix := common.GetRunID()
var err error
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
// creating the helper pod to perform disk-fill chaos runID := stringutils.GetRunID()
for _, pod := range targetPodList.Items { targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
//Get the target container name of the application pod for node, tar := range targets {
if !experimentsDetails.IsTargetContainerProvided { var targetsPerNode []string
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) for _, k := range tar.Target {
if err != nil { targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
runID := common.GetRunID() if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil { return stacktrace.Propagate(err, "could not create helper pod")
return errors.Errorf("unable to create the helper pod, err: %v", err)
} }
} }
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
log.Info("[Status]: Checking the status of the helper pods") return err
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for disk-fill chaos
log.Info("[Cleanup]: Deleting all the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, %v", err)
} }
return nil return nil
} }
// createHelperPod derive the attributes for helper pod and create the helper pod // createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error { func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, appNodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateDiskFillFaultHelperPod")
defer span.End()
mountPropagationMode := apiv1.MountPropagationHostToContainer privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds) terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{ helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{ ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID, GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace, Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName), Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations, Annotations: chaosDetails.Annotations,
}, },
Spec: apiv1.PodSpec{ Spec: apiv1.PodSpec{
HostPID: true,
RestartPolicy: apiv1.RestartPolicyNever, RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets, ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName, NodeName: appNodeName,
ServiceAccountName: experimentsDetails.ChaosServiceAccount, ServiceAccountName: experimentsDetails.ChaosServiceAccount,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds, TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
Volumes: []apiv1.Volume{ Volumes: []apiv1.Volume{
{ {
Name: "udev", Name: "socket-path",
VolumeSource: apiv1.VolumeSource{ VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{ HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.ContainerPath, Path: experimentsDetails.SocketPath,
}, },
}, },
}, },
@ -266,29 +211,38 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name disk-fill", "./helpers -name disk-fill",
}, },
Resources: chaosDetails.Resources, Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, appName), Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{ VolumeMounts: []apiv1.VolumeMount{
{ {
Name: "udev", Name: "socket-path",
MountPath: "/diskfill", MountPath: experimentsDetails.SocketPath,
MountPropagation: &mountPropagationMode,
}, },
}, },
SecurityContext: &apiv1.SecurityContext{
Privileged: &privilegedEnable,
},
}, },
}, },
}, },
} }
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{}) if len(chaosDetails.SideCar) != 0 {
return err helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
} }
// getPodEnv derive all the env required for the helper pod // getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName string) []apiv1.EnvVar { func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS). envDetails.SetEnv("TARGETS", targets).
SetEnv("APP_POD", podName).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer). SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)). SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace). SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
@ -299,13 +253,17 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName st
SetEnv("EPHEMERAL_STORAGE_MEBIBYTES", experimentsDetails.EphemeralStorageMebibytes). SetEnv("EPHEMERAL_STORAGE_MEBIBYTES", experimentsDetails.EphemeralStorageMebibytes).
SetEnv("DATA_BLOCK_SIZE", strconv.Itoa(experimentsDetails.DataBlockSize)). SetEnv("DATA_BLOCK_SIZE", strconv.Itoa(experimentsDetails.DataBlockSize)).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID). SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name") SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV return envDetails.ENV
} }
//setChaosTunables will setup a random value within a given range of values // setChaosTunables will setup a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value. // If the value is not provided in range it'll setup the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) { func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.FillPercentage = common.ValidateRange(experimentsDetails.FillPercentage) experimentsDetails.FillPercentage = common.ValidateRange(experimentsDetails.FillPercentage)
experimentsDetails.EphemeralStorageMebibytes = common.ValidateRange(experimentsDetails.EphemeralStorageMebibytes) experimentsDetails.EphemeralStorageMebibytes = common.ValidateRange(experimentsDetails.EphemeralStorageMebibytes)

View File

@ -2,31 +2,37 @@ package lib
import ( import (
"context" "context"
"fmt"
"strconv" "strconv"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/docker-service-kill/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/docker-service-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1" apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
) )
// PrepareDockerServiceKill contains prepration steps before chaos injection // PrepareDockerServiceKill contains prepration steps before chaos injection
func PrepareDockerServiceKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareDockerServiceKill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareDockerServiceKillFault")
defer span.End()
var err error var err error
if experimentsDetails.TargetNode == "" { if experimentsDetails.TargetNode == "" {
//Select node for docker-service-kill //Select node for docker-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients) experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get node name")
} }
} }
@ -34,7 +40,7 @@ func PrepareDockerServiceKill(experimentsDetails *experimentTypes.ExperimentDeta
"NodeName": experimentsDetails.TargetNode, "NodeName": experimentsDetails.TargetNode,
}) })
experimentsDetails.RunID = common.GetRunID() experimentsDetails.RunID = stringutils.GetRunID()
//Waiting for the ramp time before chaos injection //Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
@ -50,52 +56,19 @@ func PrepareDockerServiceKill(experimentsDetails *experimentTypes.ExperimentDeta
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil { if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err return stacktrace.Propagate(err, "could not set helper data")
} }
} }
// Creating the helper pod to perform docker-service-kill // Creating the helper pod to perform docker-service-kill
if err = createHelperPod(experimentsDetails, clients, chaosDetails, experimentsDetails.TargetNode); err != nil { if err = createHelperPod(ctx, experimentsDetails, clients, chaosDetails, experimentsDetails.TargetNode); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err) return stacktrace.Propagate(err, "could not create helper pod")
} }
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
log.Info("[Status]: Checking the status of the helper pod") return err
if err = status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return err
}
}
// Checking for the node to be in not-ready state
log.Info("[Status]: Check for the node to be in NotReady state")
if err = status.CheckNodeNotReadyState(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("application node is not in NotReady state, err: %v", err)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -107,7 +80,9 @@ func PrepareDockerServiceKill(experimentsDetails *experimentTypes.ExperimentDeta
} }
// createHelperPod derive the attributes for helper pod and create the helper pod // createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName string) error { func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateDockerServiceKillFaultHelperPod")
defer span.End()
privileged := true privileged := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds) terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
@ -116,7 +91,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
ObjectMeta: v1.ObjectMeta{ ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID, Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace, Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, "", experimentsDetails.ExperimentName), Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations, Annotations: chaosDetails.Annotations,
}, },
Spec: apiv1.PodSpec{ Spec: apiv1.PodSpec{
@ -188,8 +163,16 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
}, },
} }
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{}) _, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
} }
func ptrint64(p int64) *int64 { func ptrint64(p int64) *int64 {

View File

@ -1,18 +1,23 @@
package lib package lib
import ( import (
"context"
"fmt"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
"syscall" "syscall"
ebsloss "github.com/litmuschaos/litmus-go/chaoslib/litmus/ebs-loss/lib" ebsloss "github.com/litmuschaos/litmus-go/chaoslib/litmus/ebs-loss/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
) )
var ( var (
@ -20,8 +25,10 @@ var (
inject, abort chan os.Signal inject, abort chan os.Signal
) )
//PrepareEBSLossByID contains the prepration and injection steps for the experiment // PrepareEBSLossByID contains the prepration and injection steps for the experiment
func PrepareEBSLossByID(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareEBSLossByID(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEBSLossFaultByID")
defer span.End()
// inject channel is used to transmit signal notifications. // inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
@ -48,22 +55,22 @@ func PrepareEBSLossByID(experimentsDetails *experimentTypes.ExperimentDetails, c
//get the volume id or list of instance ids //get the volume id or list of instance ids
volumeIDList := strings.Split(experimentsDetails.EBSVolumeID, ",") volumeIDList := strings.Split(experimentsDetails.EBSVolumeID, ",")
if len(volumeIDList) == 0 { if len(volumeIDList) == 0 {
return errors.Errorf("no volume id found to detach") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no volume id found to detach"}
} }
// watching for the abort signal and revert the chaos // watching for the abort signal and revert the chaos
go ebsloss.AbortWatcher(experimentsDetails, volumeIDList, abort, chaosDetails) go ebsloss.AbortWatcher(experimentsDetails, volumeIDList, abort, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = ebsloss.InjectChaosInSerialMode(experimentsDetails, volumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = ebsloss.InjectChaosInSerialMode(ctx, experimentsDetails, volumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = ebsloss.InjectChaosInParallelMode(experimentsDetails, volumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = ebsloss.InjectChaosInParallelMode(ctx, experimentsDetails, volumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection

View File

@ -1,18 +1,23 @@
package lib package lib
import ( import (
"context"
"fmt"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
"syscall" "syscall"
ebsloss "github.com/litmuschaos/litmus-go/chaoslib/litmus/ebs-loss/lib" ebsloss "github.com/litmuschaos/litmus-go/chaoslib/litmus/ebs-loss/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
) )
var ( var (
@ -20,8 +25,10 @@ var (
inject, abort chan os.Signal inject, abort chan os.Signal
) )
//PrepareEBSLossByTag contains the prepration and injection steps for the experiment // PrepareEBSLossByTag contains the prepration and injection steps for the experiment
func PrepareEBSLossByTag(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareEBSLossByTag(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEBSLossFaultByTag")
defer span.End()
// inject channel is used to transmit signal notifications. // inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
@ -53,15 +60,15 @@ func PrepareEBSLossByTag(experimentsDetails *experimentTypes.ExperimentDetails,
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = ebsloss.InjectChaosInSerialMode(experimentsDetails, targetEBSVolumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = ebsloss.InjectChaosInSerialMode(ctx, experimentsDetails, targetEBSVolumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = ebsloss.InjectChaosInParallelMode(experimentsDetails, targetEBSVolumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = ebsloss.InjectChaosInParallelMode(ctx, experimentsDetails, targetEBSVolumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {

View File

@ -1,22 +1,29 @@
package lib package lib
import ( import (
"context"
"fmt"
"os" "os"
"time" "time"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
ebs "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ebs" ebs "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ebs"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
) )
//InjectChaosInSerialMode will inject the ebs loss chaos in serial mode which means one after other // InjectChaosInSerialMode will inject the ebs loss chaos in serial mode which means one after other
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetEBSVolumeIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func InjectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetEBSVolumeIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEBSLossFaultInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin //ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now() ChaosStartTimeStamp := time.Now()
@ -34,13 +41,13 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get volume attachment details //Get volume attachment details
ec2InstanceID, device, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region) ec2InstanceID, device, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region)
if err != nil { if err != nil {
return errors.Errorf("fail to get the attachment info, err: %v", err) return stacktrace.Propagate(err, "failed to get the attachment info")
} }
//Detaching the ebs volume from the instance //Detaching the ebs volume from the instance
log.Info("[Chaos]: Detaching the EBS volume from the instance") log.Info("[Chaos]: Detaching the EBS volume from the instance")
if err = ebs.EBSVolumeDetach(volumeID, experimentsDetails.Region); err != nil { if err = ebs.EBSVolumeDetach(volumeID, experimentsDetails.Region); err != nil {
return errors.Errorf("ebs detachment failed, err: %v", err) return stacktrace.Propagate(err, "ebs detachment failed")
} }
common.SetTargets(volumeID, "injected", "EBS", chaosDetails) common.SetTargets(volumeID, "injected", "EBS", chaosDetails)
@ -48,14 +55,14 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Wait for ebs volume detachment //Wait for ebs volume detachment
log.Infof("[Wait]: Wait for EBS volume detachment for volume %v", volumeID) log.Infof("[Wait]: Wait for EBS volume detachment for volume %v", volumeID)
if err = ebs.WaitForVolumeDetachment(volumeID, ec2InstanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil { if err = ebs.WaitForVolumeDetachment(volumeID, ec2InstanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to detach the ebs volume to the ec2 instance, err: %v", err) return stacktrace.Propagate(err, "ebs detachment failed")
} }
// run the probes during chaos // run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration // the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 { if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "failed to run probes")
} }
} }
@ -66,7 +73,7 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Getting the EBS volume attachment status //Getting the EBS volume attachment status
ebsState, err := ebs.GetEBSStatus(volumeID, ec2InstanceID, experimentsDetails.Region) ebsState, err := ebs.GetEBSStatus(volumeID, ec2InstanceID, experimentsDetails.Region)
if err != nil { if err != nil {
return errors.Errorf("failed to get the ebs status, err: %v", err) return stacktrace.Propagate(err, "failed to get the ebs status")
} }
switch ebsState { switch ebsState {
@ -76,13 +83,13 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Attaching the ebs volume from the instance //Attaching the ebs volume from the instance
log.Info("[Chaos]: Attaching the EBS volume back to the instance") log.Info("[Chaos]: Attaching the EBS volume back to the instance")
if err = ebs.EBSVolumeAttach(volumeID, ec2InstanceID, device, experimentsDetails.Region); err != nil { if err = ebs.EBSVolumeAttach(volumeID, ec2InstanceID, device, experimentsDetails.Region); err != nil {
return errors.Errorf("ebs attachment failed, err: %v", err) return stacktrace.Propagate(err, "ebs attachment failed")
} }
//Wait for ebs volume attachment //Wait for ebs volume attachment
log.Infof("[Wait]: Wait for EBS volume attachment for %v volume", volumeID) log.Infof("[Wait]: Wait for EBS volume attachment for %v volume", volumeID)
if err = ebs.WaitForVolumeAttachment(volumeID, ec2InstanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil { if err = ebs.WaitForVolumeAttachment(volumeID, ec2InstanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to attach the ebs volume to the ec2 instance, err: %v", err) return stacktrace.Propagate(err, "ebs attachment failed")
} }
} }
common.SetTargets(volumeID, "reverted", "EBS", chaosDetails) common.SetTargets(volumeID, "reverted", "EBS", chaosDetails)
@ -92,8 +99,10 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
return nil return nil
} }
//InjectChaosInParallelMode will inject the chaos in parallel mode that means all at once // InjectChaosInParallelMode will inject the chaos in parallel mode that means all at once
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetEBSVolumeIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func InjectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetEBSVolumeIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEBSLossFaultInParallelMode")
defer span.End()
var ec2InstanceIDList, deviceList []string var ec2InstanceIDList, deviceList []string
@ -112,8 +121,15 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//prepare the instaceIDs and device name for all the given volume //prepare the instaceIDs and device name for all the given volume
for _, volumeID := range targetEBSVolumeIDList { for _, volumeID := range targetEBSVolumeIDList {
ec2InstanceID, device, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region) ec2InstanceID, device, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region)
if err != nil || ec2InstanceID == "" || device == "" { if err != nil {
return errors.Errorf("fail to get the attachment info, err: %v", err) return stacktrace.Propagate(err, "failed to get the attachment info")
}
if ec2InstanceID == "" || device == "" {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: "Volume not attached to any instance",
Target: fmt.Sprintf("EBS Volume ID: %v", volumeID),
}
} }
ec2InstanceIDList = append(ec2InstanceIDList, ec2InstanceID) ec2InstanceIDList = append(ec2InstanceIDList, ec2InstanceID)
deviceList = append(deviceList, device) deviceList = append(deviceList, device)
@ -123,28 +139,28 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Detaching the ebs volume from the instance //Detaching the ebs volume from the instance
log.Info("[Chaos]: Detaching the EBS volume from the instance") log.Info("[Chaos]: Detaching the EBS volume from the instance")
if err := ebs.EBSVolumeDetach(volumeID, experimentsDetails.Region); err != nil { if err := ebs.EBSVolumeDetach(volumeID, experimentsDetails.Region); err != nil {
return errors.Errorf("ebs detachment failed, err: %v", err) return stacktrace.Propagate(err, "ebs detachment failed")
} }
common.SetTargets(volumeID, "injected", "EBS", chaosDetails) common.SetTargets(volumeID, "injected", "EBS", chaosDetails)
} }
log.Info("[Info]: Checking if the detachment process initiated") log.Info("[Info]: Checking if the detachment process initiated")
if err := ebs.CheckEBSDetachmentInitialisation(targetEBSVolumeIDList, ec2InstanceIDList, experimentsDetails.Region); err != nil { if err := ebs.CheckEBSDetachmentInitialisation(targetEBSVolumeIDList, ec2InstanceIDList, experimentsDetails.Region); err != nil {
return errors.Errorf("fail to initialise the detachment") return stacktrace.Propagate(err, "failed to initialise the detachment")
} }
for i, volumeID := range targetEBSVolumeIDList { for i, volumeID := range targetEBSVolumeIDList {
//Wait for ebs volume detachment //Wait for ebs volume detachment
log.Infof("[Wait]: Wait for EBS volume detachment for volume %v", volumeID) log.Infof("[Wait]: Wait for EBS volume detachment for volume %v", volumeID)
if err := ebs.WaitForVolumeDetachment(volumeID, ec2InstanceIDList[i], experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil { if err := ebs.WaitForVolumeDetachment(volumeID, ec2InstanceIDList[i], experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to detach the ebs volume to the ec2 instance, err: %v", err) return stacktrace.Propagate(err, "ebs detachment failed")
} }
} }
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "failed to run probes")
} }
} }
@ -157,7 +173,7 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Getting the EBS volume attachment status //Getting the EBS volume attachment status
ebsState, err := ebs.GetEBSStatus(volumeID, ec2InstanceIDList[i], experimentsDetails.Region) ebsState, err := ebs.GetEBSStatus(volumeID, ec2InstanceIDList[i], experimentsDetails.Region)
if err != nil { if err != nil {
return errors.Errorf("failed to get the ebs status, err: %v", err) return stacktrace.Propagate(err, "failed to get the ebs status")
} }
switch ebsState { switch ebsState {
@ -167,13 +183,13 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Attaching the ebs volume from the instance //Attaching the ebs volume from the instance
log.Info("[Chaos]: Attaching the EBS volume from the instance") log.Info("[Chaos]: Attaching the EBS volume from the instance")
if err = ebs.EBSVolumeAttach(volumeID, ec2InstanceIDList[i], deviceList[i], experimentsDetails.Region); err != nil { if err = ebs.EBSVolumeAttach(volumeID, ec2InstanceIDList[i], deviceList[i], experimentsDetails.Region); err != nil {
return errors.Errorf("ebs attachment failed, err: %v", err) return stacktrace.Propagate(err, "ebs attachment failed")
} }
//Wait for ebs volume attachment //Wait for ebs volume attachment
log.Infof("[Wait]: Wait for EBS volume attachment for volume %v", volumeID) log.Infof("[Wait]: Wait for EBS volume attachment for volume %v", volumeID)
if err = ebs.WaitForVolumeAttachment(volumeID, ec2InstanceIDList[i], experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil { if err = ebs.WaitForVolumeAttachment(volumeID, ec2InstanceIDList[i], experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to attach the ebs volume to the ec2 instance, err: %v", err) return stacktrace.Propagate(err, "ebs attachment failed")
} }
} }
common.SetTargets(volumeID, "reverted", "EBS", chaosDetails) common.SetTargets(volumeID, "reverted", "EBS", chaosDetails)
@ -193,13 +209,13 @@ func AbortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, volumeI
//Get volume attachment details //Get volume attachment details
instanceID, deviceName, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region) instanceID, deviceName, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region)
if err != nil { if err != nil {
log.Errorf("fail to get the attachment info, err: %v", err) log.Errorf("Failed to get the attachment info: %v", err)
} }
//Getting the EBS volume attachment status //Getting the EBS volume attachment status
ebsState, err := ebs.GetEBSStatus(experimentsDetails.EBSVolumeID, instanceID, experimentsDetails.Region) ebsState, err := ebs.GetEBSStatus(experimentsDetails.EBSVolumeID, instanceID, experimentsDetails.Region)
if err != nil { if err != nil {
log.Errorf("failed to get the ebs status when an abort signal is received, err: %v", err) log.Errorf("Failed to get the ebs status when an abort signal is received: %v", err)
} }
if ebsState != "attached" { if ebsState != "attached" {
@ -207,13 +223,13 @@ func AbortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, volumeI
//We first wait for the volume to get in detached state then we are attaching it. //We first wait for the volume to get in detached state then we are attaching it.
log.Info("[Abort]: Wait for EBS complete volume detachment") log.Info("[Abort]: Wait for EBS complete volume detachment")
if err = ebs.WaitForVolumeDetachment(experimentsDetails.EBSVolumeID, instanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil { if err = ebs.WaitForVolumeDetachment(experimentsDetails.EBSVolumeID, instanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
log.Errorf("unable to detach the ebs volume, err: %v", err) log.Errorf("Unable to detach the ebs volume: %v", err)
} }
//Attaching the ebs volume from the instance //Attaching the ebs volume from the instance
log.Info("[Chaos]: Attaching the EBS volume from the instance") log.Info("[Chaos]: Attaching the EBS volume from the instance")
err = ebs.EBSVolumeAttach(experimentsDetails.EBSVolumeID, instanceID, deviceName, experimentsDetails.Region) err = ebs.EBSVolumeAttach(experimentsDetails.EBSVolumeID, instanceID, deviceName, experimentsDetails.Region)
if err != nil { if err != nil {
log.Errorf("ebs attachment failed when an abort signal is received, err: %v", err) log.Errorf("EBS attachment failed when an abort signal is received: %v", err)
} }
} }
common.SetTargets(volumeID, "reverted", "EBS", chaosDetails) common.SetTargets(volumeID, "reverted", "EBS", chaosDetails)

View File

@ -1,21 +1,26 @@
package lib package lib
import ( import (
"context"
"fmt"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
"syscall" "syscall"
"time" "time"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2" awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ec2-terminate-by-id/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ec2-terminate-by-id/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
) )
var ( var (
@ -23,8 +28,10 @@ var (
inject, abort chan os.Signal inject, abort chan os.Signal
) )
//PrepareEC2TerminateByID contains the prepration and injection steps for the experiment // PrepareEC2TerminateByID contains the prepration and injection steps for the experiment
func PrepareEC2TerminateByID(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareEC2TerminateByID(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEC2TerminateFaultByID")
defer span.End()
// inject channel is used to transmit signal notifications. // inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
@ -44,8 +51,8 @@ func PrepareEC2TerminateByID(experimentsDetails *experimentTypes.ExperimentDetai
//get the instance id or list of instance ids //get the instance id or list of instance ids
instanceIDList := strings.Split(experimentsDetails.Ec2InstanceID, ",") instanceIDList := strings.Split(experimentsDetails.Ec2InstanceID, ",")
if len(instanceIDList) == 0 { if experimentsDetails.Ec2InstanceID == "" || len(instanceIDList) == 0 {
return errors.Errorf("no instance id found to terminate") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no EC2 instance ID found to terminate"}
} }
// watching for the abort signal and revert the chaos // watching for the abort signal and revert the chaos
@ -53,15 +60,15 @@ func PrepareEC2TerminateByID(experimentsDetails *experimentTypes.ExperimentDetai
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -72,8 +79,10 @@ func PrepareEC2TerminateByID(experimentsDetails *experimentTypes.ExperimentDetai
return nil return nil
} }
//injectChaosInSerialMode will inject the ec2 instance termination in serial mode that is one after other // injectChaosInSerialMode will inject the ec2 instance termination in serial mode that is one after other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByIDInSerialMode")
defer span.End()
select { select {
case <-inject: case <-inject:
@ -100,7 +109,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Stopping the EC2 instance //Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance") log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil { if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to stop, err: %v", err) return stacktrace.Propagate(err, "ec2 instance failed to stop")
} }
common.SetTargets(id, "injected", "EC2", chaosDetails) common.SetTargets(id, "injected", "EC2", chaosDetails)
@ -108,14 +117,14 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Wait for ec2 instance to completely stop //Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id) log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil { if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to stop the ec2 instance, err: %v", err) return stacktrace.Propagate(err, "ec2 instance failed to stop")
} }
// run the probes during chaos // run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration // the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 { if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "failed to run probes")
} }
} }
@ -127,13 +136,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
if experimentsDetails.ManagedNodegroup != "enable" { if experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Chaos]: Starting back the EC2 instance") log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil { if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to start, err: %v", err) return stacktrace.Propagate(err, "ec2 instance failed to start")
} }
//Wait for ec2 instance to get in running state //Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id) log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil { if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to start the ec2 instance, err: %v", err) return stacktrace.Propagate(err, "ec2 instance failed to start")
} }
} }
common.SetTargets(id, "reverted", "EC2", chaosDetails) common.SetTargets(id, "reverted", "EC2", chaosDetails)
@ -145,7 +154,9 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
// injectChaosInParallelMode will inject the ec2 instance termination in parallel mode that is all at once // injectChaosInParallelMode will inject the ec2 instance termination in parallel mode that is all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByIDInParallelMode")
defer span.End()
select { select {
case <-inject: case <-inject:
@ -171,7 +182,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Stopping the EC2 instance //Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance") log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil { if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to stop, err: %v", err) return stacktrace.Propagate(err, "ec2 instance failed to stop")
} }
common.SetTargets(id, "injected", "EC2", chaosDetails) common.SetTargets(id, "injected", "EC2", chaosDetails)
} }
@ -180,15 +191,15 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for ec2 instance to completely stop //Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id) log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil { if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to stop the ec2 instance, err: %v", err) return stacktrace.Propagate(err, "ec2 instance failed to stop")
} }
common.SetTargets(id, "reverted", "EC2 Instance ID", chaosDetails) common.SetTargets(id, "reverted", "EC2 Instance ID", chaosDetails)
} }
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "failed to run probes")
} }
} }
@ -202,7 +213,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for _, id := range instanceIDList { for _, id := range instanceIDList {
log.Info("[Chaos]: Starting back the EC2 instance") log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil { if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to start, err: %v", err) return stacktrace.Propagate(err, "ec2 instance failed to start")
} }
} }
@ -210,7 +221,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for ec2 instance to get in running state //Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id) log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil { if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to start the ec2 instance, err: %v", err) return stacktrace.Propagate(err, "ec2 instance failed to start")
} }
} }
} }
@ -232,19 +243,19 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanc
for _, id := range instanceIDList { for _, id := range instanceIDList {
instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region) instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region)
if err != nil { if err != nil {
log.Errorf("fail to get instance status when an abort signal is received,err :%v", err) log.Errorf("Failed to get instance status when an abort signal is received: %v", err)
} }
if instanceState != "running" && experimentsDetails.ManagedNodegroup != "enable" { if instanceState != "running" && experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Abort]: Waiting for the EC2 instance to get down") log.Info("[Abort]: Waiting for the EC2 instance to get down")
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil { if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
log.Errorf("unable to wait till stop of the instance, err: %v", err) log.Errorf("Unable to wait till stop of the instance: %v", err)
} }
log.Info("[Abort]: Starting EC2 instance as abort signal received") log.Info("[Abort]: Starting EC2 instance as abort signal received")
err := awslib.EC2Start(id, experimentsDetails.Region) err := awslib.EC2Start(id, experimentsDetails.Region)
if err != nil { if err != nil {
log.Errorf("ec2 instance failed to start when an abort signal is received, err: %v", err) log.Errorf("EC2 instance failed to start when an abort signal is received: %v", err)
} }
} }
common.SetTargets(id, "reverted", "EC2", chaosDetails) common.SetTargets(id, "reverted", "EC2", chaosDetails)

View File

@ -1,28 +1,35 @@
package lib package lib
import ( import (
"context"
"fmt"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
"syscall" "syscall"
"time" "time"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2" awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ec2-terminate-by-tag/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ec2-terminate-by-tag/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/palantir/stacktrace"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
) )
var inject, abort chan os.Signal var inject, abort chan os.Signal
//PrepareEC2TerminateByTag contains the prepration and injection steps for the experiment // PrepareEC2TerminateByTag contains the prepration and injection steps for the experiment
func PrepareEC2TerminateByTag(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareEC2TerminateByTag(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEC2TerminateFaultByTag")
defer span.End()
// inject channel is used to transmit signal notifications. // inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
@ -48,15 +55,15 @@ func PrepareEC2TerminateByTag(experimentsDetails *experimentTypes.ExperimentDeta
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err := injectChaosInSerialMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err := injectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err := injectChaosInParallelMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err := injectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -67,8 +74,10 @@ func PrepareEC2TerminateByTag(experimentsDetails *experimentTypes.ExperimentDeta
return nil return nil
} }
//injectChaosInSerialMode will inject the ce2 instance termination in serial mode that is one after other // injectChaosInSerialMode will inject the ce2 instance termination in serial mode that is one after other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByTagInSerialMode")
defer span.End()
select { select {
case <-inject: case <-inject:
@ -95,7 +104,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Stopping the EC2 instance //Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance") log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil { if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to stop, err: %v", err) return stacktrace.Propagate(err, "ec2 instance failed to stop")
} }
common.SetTargets(id, "injected", "EC2", chaosDetails) common.SetTargets(id, "injected", "EC2", chaosDetails)
@ -103,14 +112,14 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Wait for ec2 instance to completely stop //Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id) log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil { if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to stop the ec2 instance, err: %v", err) return stacktrace.Propagate(err, "ec2 instance failed to stop")
} }
// run the probes during chaos // run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration // the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 { if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "failed to run probes")
} }
} }
@ -122,13 +131,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
if experimentsDetails.ManagedNodegroup != "enable" { if experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Chaos]: Starting back the EC2 instance") log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil { if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to start, err: %v", err) return stacktrace.Propagate(err, "ec2 instance failed to start")
} }
//Wait for ec2 instance to get in running state //Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id) log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil { if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to start the ec2 instance, err: %v", err) return stacktrace.Propagate(err, "ec2 instance failed to start")
} }
} }
common.SetTargets(id, "reverted", "EC2", chaosDetails) common.SetTargets(id, "reverted", "EC2", chaosDetails)
@ -140,7 +149,9 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
// injectChaosInParallelMode will inject the ce2 instance termination in parallel mode that is all at once // injectChaosInParallelMode will inject the ce2 instance termination in parallel mode that is all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByTagInParallelMode")
defer span.End()
select { select {
case <-inject: case <-inject:
@ -165,7 +176,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Stopping the EC2 instance //Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance") log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil { if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to stop, err: %v", err) return stacktrace.Propagate(err, "ec2 instance failed to stop")
} }
common.SetTargets(id, "injected", "EC2", chaosDetails) common.SetTargets(id, "injected", "EC2", chaosDetails)
} }
@ -174,14 +185,14 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for ec2 instance to completely stop //Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id) log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil { if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to stop the ec2 instance, err: %v", err) return stacktrace.Propagate(err, "ec2 instance failed to stop")
} }
} }
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "failed to run probes")
} }
} }
@ -195,7 +206,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for _, id := range instanceIDList { for _, id := range instanceIDList {
log.Info("[Chaos]: Starting back the EC2 instance") log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil { if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to start, err: %v", err) return stacktrace.Propagate(err, "ec2 instance failed to start")
} }
} }
@ -203,7 +214,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for ec2 instance to get in running state //Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id) log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil { if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to start the ec2 instance, err: %v", err) return stacktrace.Propagate(err, "ec2 instance failed to start")
} }
} }
} }
@ -216,21 +227,24 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
return nil return nil
} }
//SetTargetInstance will select the target instance which are in running state and filtered from the given instance tag // SetTargetInstance will select the target instance which are in running state and filtered from the given instance tag
func SetTargetInstance(experimentsDetails *experimentTypes.ExperimentDetails) error { func SetTargetInstance(experimentsDetails *experimentTypes.ExperimentDetails) error {
instanceIDList, err := awslib.GetInstanceList(experimentsDetails.InstanceTag, experimentsDetails.Region) instanceIDList, err := awslib.GetInstanceList(experimentsDetails.Ec2InstanceTag, experimentsDetails.Region)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "failed to get the instance id list")
} }
if len(instanceIDList) == 0 { if len(instanceIDList) == 0 {
return errors.Errorf("no instance found with the given tag %v, in region %v", experimentsDetails.InstanceTag, experimentsDetails.Region) return cerrors.Error{
ErrorCode: cerrors.ErrorTypeTargetSelection,
Reason: fmt.Sprintf("no instance found with the given tag %v, in region %v", experimentsDetails.Ec2InstanceTag, experimentsDetails.Region),
}
} }
for _, id := range instanceIDList { for _, id := range instanceIDList {
instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region) instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region)
if err != nil { if err != nil {
return errors.Errorf("fail to get the instance status while selecting the target instances, err: %v", err) return stacktrace.Propagate(err, "failed to get the instance status while selecting the target instances")
} }
if instanceState == "running" { if instanceState == "running" {
experimentsDetails.TargetInstanceIDList = append(experimentsDetails.TargetInstanceIDList, id) experimentsDetails.TargetInstanceIDList = append(experimentsDetails.TargetInstanceIDList, id)
@ -238,7 +252,10 @@ func SetTargetInstance(experimentsDetails *experimentTypes.ExperimentDetails) er
} }
if len(experimentsDetails.TargetInstanceIDList) == 0 { if len(experimentsDetails.TargetInstanceIDList) == 0 {
return errors.Errorf("fail to get any running instance having instance tag: %v", experimentsDetails.InstanceTag) return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: "failed to get any running instance",
Target: fmt.Sprintf("EC2 Instance Tag: %v", experimentsDetails.Ec2InstanceTag)}
} }
log.InfoWithValues("[Info]: Targeting the running instances filtered from instance tag", logrus.Fields{ log.InfoWithValues("[Info]: Targeting the running instances filtered from instance tag", logrus.Fields{
@ -257,19 +274,19 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanc
for _, id := range instanceIDList { for _, id := range instanceIDList {
instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region) instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region)
if err != nil { if err != nil {
log.Errorf("fail to get instance status when an abort signal is received,err :%v", err) log.Errorf("Failed to get instance status when an abort signal is received: %v", err)
} }
if instanceState != "running" && experimentsDetails.ManagedNodegroup != "enable" { if instanceState != "running" && experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Abort]: Waiting for the EC2 instance to get down") log.Info("[Abort]: Waiting for the EC2 instance to get down")
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil { if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
log.Errorf("unable to wait till stop of the instance, err: %v", err) log.Errorf("Unable to wait till stop of the instance: %v", err)
} }
log.Info("[Abort]: Starting EC2 instance as abort signal received") log.Info("[Abort]: Starting EC2 instance as abort signal received")
err := awslib.EC2Start(id, experimentsDetails.Region) err := awslib.EC2Start(id, experimentsDetails.Region)
if err != nil { if err != nil {
log.Errorf("ec2 instance failed to start when an abort signal is received, err: %v", err) log.Errorf("EC2 instance failed to start when an abort signal is received: %v", err)
} }
} }
common.SetTargets(id, "reverted", "EC2", chaosDetails) common.SetTargets(id, "reverted", "EC2", chaosDetails)

View File

@ -1,21 +1,26 @@
package lib package lib
import ( import (
"context"
"fmt"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
"syscall" "syscall"
"time" "time"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/gcp" "github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-disk-loss/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-disk-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1" "google.golang.org/api/compute/v1"
) )
@ -25,7 +30,9 @@ var (
) )
// PrepareDiskVolumeLossByLabel contains the prepration and injection steps for the experiment // PrepareDiskVolumeLossByLabel contains the prepration and injection steps for the experiment
func PrepareDiskVolumeLossByLabel(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareDiskVolumeLossByLabel(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareGCPDiskVolumeLossFaultByLabel")
defer span.End()
// inject channel is used to transmit signal notifications. // inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
@ -61,15 +68,15 @@ func PrepareDiskVolumeLossByLabel(computeService *compute.Service, experimentsDe
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInSerialMode(ctx, computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInParallelMode(ctx, computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
} }
@ -83,7 +90,9 @@ func PrepareDiskVolumeLossByLabel(computeService *compute.Service, experimentsDe
} }
// injectChaosInSerialMode will inject the disk loss chaos in serial mode which means one after the other // injectChaosInSerialMode will inject the disk loss chaos in serial mode which means one after the other
func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPDiskVolumeLossFaultByLabelInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin //ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now() ChaosStartTimeStamp := time.Now()
@ -102,7 +111,7 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Detaching the disk volume from the instance //Detaching the disk volume from the instance
log.Info("[Chaos]: Detaching the disk volume from the instance") log.Info("[Chaos]: Detaching the disk volume from the instance")
if err = gcp.DiskVolumeDetach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i]); err != nil { if err = gcp.DiskVolumeDetach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i]); err != nil {
return errors.Errorf("disk detachment failed, err: %v", err) return stacktrace.Propagate(err, "disk detachment failed")
} }
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails) common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
@ -110,13 +119,13 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Wait for disk volume detachment //Wait for disk volume detachment
log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i]) log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil { if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to detach the disk volume from the vm instance, err: %v", err) return stacktrace.Propagate(err, "unable to detach the disk volume from the vm instance")
} }
// run the probes during chaos // run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration // the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 { if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -128,7 +137,7 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Getting the disk volume attachment status //Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone) diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone)
if err != nil { if err != nil {
return errors.Errorf("failed to get the disk volume status, err: %v", err) return stacktrace.Propagate(err, "failed to get the disk volume status")
} }
switch diskState { switch diskState {
@ -138,13 +147,13 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Attaching the disk volume to the instance //Attaching the disk volume to the instance
log.Info("[Chaos]: Attaching the disk volume back to the instance") log.Info("[Chaos]: Attaching the disk volume back to the instance")
if err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil { if err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return errors.Errorf("disk attachment failed, err: %v", err) return stacktrace.Propagate(err, "disk attachment failed")
} }
//Wait for disk volume attachment //Wait for disk volume attachment
log.Infof("[Wait]: Wait for disk volume attachment for %v volume", targetDiskVolumeNamesList[i]) log.Infof("[Wait]: Wait for disk volume attachment for %v volume", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil { if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to attach the disk volume to the vm instance, err: %v", err) return stacktrace.Propagate(err, "unable to attach the disk volume to the vm instance")
} }
} }
@ -158,7 +167,9 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
} }
// injectChaosInParallelMode will inject the disk loss chaos in parallel mode that means all at once // injectChaosInParallelMode will inject the disk loss chaos in parallel mode that means all at once
func injectChaosInParallelMode(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPDiskVolumeLossFaultByLabelInParallelMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin //ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now() ChaosStartTimeStamp := time.Now()
@ -177,7 +188,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
//Detaching the disk volume from the instance //Detaching the disk volume from the instance
log.Info("[Chaos]: Detaching the disk volume from the instance") log.Info("[Chaos]: Detaching the disk volume from the instance")
if err = gcp.DiskVolumeDetach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i]); err != nil { if err = gcp.DiskVolumeDetach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i]); err != nil {
return errors.Errorf("disk detachment failed, err: %v", err) return stacktrace.Propagate(err, "disk detachment failed")
} }
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails) common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
@ -188,13 +199,13 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
//Wait for disk volume detachment //Wait for disk volume detachment
log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i]) log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil { if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to detach the disk volume from the vm instance, err: %v", err) return stacktrace.Propagate(err, "unable to detach the disk volume from the vm instance")
} }
} }
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -208,7 +219,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
//Getting the disk volume attachment status //Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone) diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone)
if err != nil { if err != nil {
return errors.Errorf("failed to get the disk status, err: %v", err) return stacktrace.Propagate(err, "failed to get the disk status")
} }
switch diskState { switch diskState {
@ -218,13 +229,13 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
//Attaching the disk volume to the instance //Attaching the disk volume to the instance
log.Info("[Chaos]: Attaching the disk volume to the instance") log.Info("[Chaos]: Attaching the disk volume to the instance")
if err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil { if err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return errors.Errorf("disk attachment failed, err: %v", err) return stacktrace.Propagate(err, "disk attachment failed")
} }
//Wait for disk volume attachment //Wait for disk volume attachment
log.Infof("[Wait]: Wait for disk volume attachment for volume %v", targetDiskVolumeNamesList[i]) log.Infof("[Wait]: Wait for disk volume attachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil { if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to attach the disk volume to the vm instance, err: %v", err) return stacktrace.Propagate(err, "unable to attach the disk volume to the vm instance")
} }
} }
@ -249,25 +260,25 @@ func abortWatcher(computeService *compute.Service, experimentsDetails *experimen
//Getting the disk volume attachment status //Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone) diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone)
if err != nil { if err != nil {
log.Errorf("failed to get the disk state when an abort signal is received, err: %v", err) log.Errorf("Failed to get %s disk state when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
} }
if diskState != "attached" { if diskState != "attached" {
//Wait for disk volume detachment //Wait for disk volume detachment
//We first wait for the volume to get in detached state then we are attaching it. //We first wait for the volume to get in detached state then we are attaching it.
log.Info("[Abort]: Wait for complete disk volume detachment") log.Infof("[Abort]: Wait for %s complete disk volume detachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil { if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
log.Errorf("unable to detach the disk volume, err: %v", err) log.Errorf("Unable to detach %s disk volume, err: %v", targetDiskVolumeNamesList[i], err)
} }
//Attaching the disk volume from the instance //Attaching the disk volume from the instance
log.Info("[Chaos]: Attaching the disk volume from the instance") log.Infof("[Chaos]: Attaching %s disk volume to the instance", targetDiskVolumeNamesList[i])
err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]) err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i])
if err != nil { if err != nil {
log.Errorf("disk attachment failed when an abort signal is received, err: %v", err) log.Errorf("%s disk attachment failed when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
} }
} }
@ -285,12 +296,12 @@ func getDeviceNamesAndVMInstanceNames(diskVolumeNamesList []string, computeServi
instanceName, err := gcp.GetVolumeAttachmentDetails(computeService, experimentsDetails.GCPProjectID, experimentsDetails.Zones, diskVolumeNamesList[i]) instanceName, err := gcp.GetVolumeAttachmentDetails(computeService, experimentsDetails.GCPProjectID, experimentsDetails.Zones, diskVolumeNamesList[i])
if err != nil || instanceName == "" { if err != nil || instanceName == "" {
return errors.Errorf("failed to get the attachment info, err: %v", err) return stacktrace.Propagate(err, "failed to get the disk attachment info")
} }
deviceName, err := gcp.GetDiskDeviceNameForVM(computeService, diskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones, instanceName) deviceName, err := gcp.GetDiskDeviceNameForVM(computeService, diskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones, instanceName)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "failed to fetch the disk device name")
} }
experimentsDetails.TargetDiskInstanceNamesList = append(experimentsDetails.TargetDiskInstanceNamesList, instanceName) experimentsDetails.TargetDiskInstanceNamesList = append(experimentsDetails.TargetDiskInstanceNamesList, instanceName)

View File

@ -1,21 +1,27 @@
package lib package lib
import ( import (
"context"
"fmt"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
"syscall" "syscall"
"time" "time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
gcp "github.com/litmuschaos/litmus-go/pkg/cloud/gcp" "github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-disk-loss/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-disk-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"github.com/pkg/errors" "github.com/pkg/errors"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1" "google.golang.org/api/compute/v1"
) )
@ -25,7 +31,9 @@ var (
) )
// PrepareDiskVolumeLoss contains the prepration and injection steps for the experiment // PrepareDiskVolumeLoss contains the prepration and injection steps for the experiment
func PrepareDiskVolumeLoss(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareDiskVolumeLoss(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareVMDiskLossFault")
defer span.End()
// inject channel is used to transmit signal notifications. // inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
@ -51,7 +59,7 @@ func PrepareDiskVolumeLoss(computeService *compute.Service, experimentsDetails *
//get the device names for the given disks //get the device names for the given disks
if err := getDeviceNamesList(computeService, experimentsDetails, diskNamesList, diskZonesList); err != nil { if err := getDeviceNamesList(computeService, experimentsDetails, diskNamesList, diskZonesList); err != nil {
return err return stacktrace.Propagate(err, "failed to fetch the disk device names")
} }
select { select {
@ -65,15 +73,15 @@ func PrepareDiskVolumeLoss(computeService *compute.Service, experimentsDetails *
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(computeService, experimentsDetails, diskNamesList, diskZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInSerialMode(ctx, computeService, experimentsDetails, diskNamesList, diskZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(computeService, experimentsDetails, diskNamesList, diskZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInParallelMode(ctx, computeService, experimentsDetails, diskNamesList, diskZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
} }
@ -87,8 +95,9 @@ func PrepareDiskVolumeLoss(computeService *compute.Service, experimentsDetails *
} }
// injectChaosInSerialMode will inject the disk loss chaos in serial mode which means one after the other // injectChaosInSerialMode will inject the disk loss chaos in serial mode which means one after the other
func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMDiskLossFaultInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin //ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now() ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds()) duration := int(time.Since(ChaosStartTimeStamp).Seconds())
@ -103,23 +112,23 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
for i := range targetDiskVolumeNamesList { for i := range targetDiskVolumeNamesList {
//Detaching the disk volume from the instance //Detaching the disk volume from the instance
log.Info("[Chaos]: Detaching the disk volume from the instance") log.Infof("[Chaos]: Detaching %s disk volume from the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeDetach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i]); err != nil { if err = gcp.DiskVolumeDetach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i]); err != nil {
return errors.Errorf("disk detachment failed, err: %v", err) return stacktrace.Propagate(err, "disk detachment failed")
} }
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails) common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
//Wait for disk volume detachment //Wait for disk volume detachment
log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i]) log.Infof("[Wait]: Wait for %s disk volume detachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil { if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to detach the disk volume from the vm instance, err: %v", err) return stacktrace.Propagate(err, "unable to detach disk volume from the vm instance")
} }
// run the probes during chaos // run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration // the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 { if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -131,23 +140,23 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Getting the disk volume attachment status //Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i]) diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i])
if err != nil { if err != nil {
return errors.Errorf("failed to get the disk volume status, err: %v", err) return stacktrace.Propagate(err, fmt.Sprintf("failed to get %s disk volume status", targetDiskVolumeNamesList[i]))
} }
switch diskState { switch diskState {
case "attached": case "attached":
log.Info("[Skip]: The disk volume is already attached") log.Infof("[Skip]: %s disk volume is already attached", targetDiskVolumeNamesList[i])
default: default:
//Attaching the disk volume to the instance //Attaching the disk volume to the instance
log.Info("[Chaos]: Attaching the disk volume back to the instance") log.Infof("[Chaos]: Attaching %s disk volume back to the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil { if err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return errors.Errorf("disk attachment failed, err: %v", err) return stacktrace.Propagate(err, "disk attachment failed")
} }
//Wait for disk volume attachment //Wait for disk volume attachment
log.Infof("[Wait]: Wait for disk volume attachment for %v volume", targetDiskVolumeNamesList[i]) log.Infof("[Wait]: Wait for %s disk volume attachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil { if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to attach the disk volume to the vm instance, err: %v", err) return stacktrace.Propagate(err, "unable to attach disk volume to the vm instance")
} }
} }
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails) common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
@ -158,7 +167,9 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
} }
// injectChaosInParallelMode will inject the disk loss chaos in parallel mode that means all at once // injectChaosInParallelMode will inject the disk loss chaos in parallel mode that means all at once
func injectChaosInParallelMode(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMDiskLossFaultInParallelMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin //ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now() ChaosStartTimeStamp := time.Now()
@ -175,9 +186,9 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
for i := range targetDiskVolumeNamesList { for i := range targetDiskVolumeNamesList {
//Detaching the disk volume from the instance //Detaching the disk volume from the instance
log.Info("[Chaos]: Detaching the disk volume from the instance") log.Infof("[Chaos]: Detaching %s disk volume from the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeDetach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i]); err != nil { if err = gcp.DiskVolumeDetach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i]); err != nil {
return errors.Errorf("disk detachment failed, err: %v", err) return stacktrace.Propagate(err, "disk detachment failed")
} }
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails) common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
@ -186,15 +197,15 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
for i := range targetDiskVolumeNamesList { for i := range targetDiskVolumeNamesList {
//Wait for disk volume detachment //Wait for disk volume detachment
log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i]) log.Infof("[Wait]: Wait for %s disk volume detachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil { if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to detach the disk volume from the vm instance, err: %v", err) return stacktrace.Propagate(err, "unable to detach disk volume from the vm instance")
} }
} }
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -213,18 +224,18 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
switch diskState { switch diskState {
case "attached": case "attached":
log.Info("[Skip]: The disk volume is already attached") log.Infof("[Skip]: %s disk volume is already attached", targetDiskVolumeNamesList[i])
default: default:
//Attaching the disk volume to the instance //Attaching the disk volume to the instance
log.Info("[Chaos]: Attaching the disk volume to the instance") log.Infof("[Chaos]: Attaching %s disk volume to the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil { if err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return errors.Errorf("disk attachment failed, err: %v", err) return stacktrace.Propagate(err, "disk attachment failed")
} }
//Wait for disk volume attachment //Wait for disk volume attachment
log.Infof("[Wait]: Wait for disk volume attachment for volume %v", targetDiskVolumeNamesList[i]) log.Infof("[Wait]: Wait for %s disk volume attachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil { if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to attach the disk volume to the vm instance, err: %v", err) return stacktrace.Propagate(err, "unable to attach disk volume to the vm instance")
} }
} }
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails) common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
@ -246,25 +257,25 @@ func abortWatcher(computeService *compute.Service, experimentsDetails *experimen
//Getting the disk volume attachment status //Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i]) diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i])
if err != nil { if err != nil {
log.Errorf("failed to get the disk state when an abort signal is received, err: %v", err) log.Errorf("Failed to get %s disk state when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
} }
if diskState != "attached" { if diskState != "attached" {
//Wait for disk volume detachment //Wait for disk volume detachment
//We first wait for the volume to get in detached state then we are attaching it. //We first wait for the volume to get in detached state then we are attaching it.
log.Info("[Abort]: Wait for complete disk volume detachment") log.Infof("[Abort]: Wait for complete disk volume detachment for %s", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil { if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
log.Errorf("unable to detach the disk volume, err: %v", err) log.Errorf("Unable to detach %s disk volume, err: %v", targetDiskVolumeNamesList[i], err)
} }
//Attaching the disk volume from the instance //Attaching the disk volume from the instance
log.Info("[Chaos]: Attaching the disk volume from the instance") log.Infof("[Chaos]: Attaching %s disk volume from the instance", targetDiskVolumeNamesList[i])
err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]) err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i])
if err != nil { if err != nil {
log.Errorf("disk attachment failed when an abort signal is received, err: %v", err) log.Errorf("%s disk attachment failed when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
} }
} }

View File

@ -1,28 +1,35 @@
package lib package lib
import ( import (
"context"
"fmt"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
"syscall" "syscall"
"time" "time"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
gcplib "github.com/litmuschaos/litmus-go/pkg/cloud/gcp" gcplib "github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-instance-stop/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1" "google.golang.org/api/compute/v1"
) )
var inject, abort chan os.Signal var inject, abort chan os.Signal
// PrepareVMStopByLabel executes the experiment steps by injecting chaos into target VM instances // PrepareVMStopByLabel executes the experiment steps by injecting chaos into target VM instances
func PrepareVMStopByLabel(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareVMStopByLabel(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareGCPVMInstanceStopFaultByLabel")
defer span.End()
// inject channel is used to transmit signal notifications. // inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
@ -48,15 +55,15 @@ func PrepareVMStopByLabel(computeService *compute.Service, experimentsDetails *e
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err := injectChaosInSerialMode(computeService, experimentsDetails, instanceNamesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err := injectChaosInSerialMode(ctx, computeService, experimentsDetails, instanceNamesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err := injectChaosInParallelMode(computeService, experimentsDetails, instanceNamesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err := injectChaosInParallelMode(ctx, computeService, experimentsDetails, instanceNamesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -69,7 +76,9 @@ func PrepareVMStopByLabel(computeService *compute.Service, experimentsDetails *e
} }
// injectChaosInSerialMode stops VM instances in serial mode i.e. one after the other // injectChaosInSerialMode stops VM instances in serial mode i.e. one after the other
func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPVMInstanceStopFaultByLabelInSerialMode")
defer span.End()
select { select {
case <-inject: case <-inject:
@ -96,7 +105,7 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Stopping the VM instance //Stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i]) log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil { if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("VM instance failed to stop, err: %v", err) return stacktrace.Propagate(err, "VM instance failed to stop")
} }
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails) common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
@ -104,13 +113,13 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Wait for VM instance to completely stop //Wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to stop", instanceNamesList[i]) log.Infof("[Wait]: Wait for VM instance %s to stop", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil { if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("%s vm instance failed to fully shutdown, err: %v", instanceNamesList[i], err) return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
} }
// run the probes during chaos // run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration // the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 { if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -125,7 +134,7 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
// wait for VM instance to get in running state // wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in RUNNING state", instanceNamesList[i]) log.Infof("[Wait]: Wait for VM instance %s to get in RUNNING state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil { if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("unable to start %s vm instance, err: %v", instanceNamesList[i], err) return stacktrace.Propagate(err, "unable to start %s vm instance")
} }
default: default:
@ -133,13 +142,13 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
// starting the VM instance // starting the VM instance
log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i]) log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil { if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("%s vm instance failed to start, err: %v", instanceNamesList[i], err) return stacktrace.Propagate(err, "vm instance failed to start")
} }
// wait for VM instance to get in running state // wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in RUNNING state", instanceNamesList[i]) log.Infof("[Wait]: Wait for VM instance %s to get in RUNNING state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil { if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("unable to start %s vm instance, err: %v", instanceNamesList[i], err) return stacktrace.Propagate(err, "unable to start %s vm instance")
} }
} }
@ -154,8 +163,9 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
} }
// injectChaosInParallelMode will inject the VM instance termination in serial mode that is one after other // injectChaosInParallelMode will inject the VM instance termination in serial mode that is one after other
func injectChaosInParallelMode(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPVMInstanceStopFaultByLabelInParallelMode")
defer span.End()
select { select {
case <-inject: case <-inject:
// stopping the chaos execution, if abort signal received // stopping the chaos execution, if abort signal received
@ -181,7 +191,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
// stopping the VM instance // stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i]) log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil { if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("%s vm instance failed to stop, err: %v", instanceNamesList[i], err) return stacktrace.Propagate(err, "vm instance failed to stop")
} }
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails) common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
@ -192,13 +202,13 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
// wait for VM instance to completely stop // wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i]) log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil { if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("%s vm instance failed to fully shutdown, err: %v", instanceNamesList[i], err) return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
} }
} }
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -215,7 +225,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
log.Infof("[Wait]: Wait for VM instance '%v' to get in running state", instanceNamesList[i]) log.Infof("[Wait]: Wait for VM instance '%v' to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil { if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("unable to start the vm instance, err: %v", err) return stacktrace.Propagate(err, "unable to start the vm instance")
} }
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails) common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
@ -228,7 +238,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
log.Info("[Chaos]: Starting back the VM instance") log.Info("[Chaos]: Starting back the VM instance")
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil { if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("vm instance failed to start, err: %v", err) return stacktrace.Propagate(err, "vm instance failed to start")
} }
} }
@ -237,7 +247,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
log.Infof("[Wait]: Wait for VM instance '%v' to get in running state", instanceNamesList[i]) log.Infof("[Wait]: Wait for VM instance '%v' to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil { if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("unable to start the vm instance, err: %v", err) return stacktrace.Propagate(err, "unable to start the vm instance")
} }
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails) common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
@ -260,19 +270,19 @@ func abortWatcher(computeService *compute.Service, experimentsDetails *experimen
for i := range instanceNamesList { for i := range instanceNamesList {
instanceState, err := gcplib.GetVMInstanceStatus(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones) instanceState, err := gcplib.GetVMInstanceStatus(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones)
if err != nil { if err != nil {
log.Errorf("fail to get instance status when an abort signal is received,err :%v", err) log.Errorf("Failed to get %s instance status when an abort signal is received, err: %v", instanceNamesList[i], err)
} }
if instanceState != "RUNNING" && experimentsDetails.ManagedInstanceGroup != "enable" { if instanceState != "RUNNING" && experimentsDetails.ManagedInstanceGroup != "enable" {
log.Info("[Abort]: Waiting for the VM instance to shut down") log.Info("[Abort]: Waiting for the VM instance to shut down")
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil { if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
log.Errorf("unable to wait till stop of the instance, err: %v", err) log.Errorf("Unable to wait till stop of %s instance, err: %v", instanceNamesList[i], err)
} }
log.Info("[Abort]: Starting VM instance as abort signal received") log.Info("[Abort]: Starting VM instance as abort signal received")
err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones) err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones)
if err != nil { if err != nil {
log.Errorf("vm instance failed to start when an abort signal is received, err: %v", err) log.Errorf("%s instance failed to start when an abort signal is received, err: %v", instanceNamesList[i], err)
} }
} }
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails) common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)

View File

@ -1,21 +1,26 @@
package lib package lib
import ( import (
"context"
"fmt"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
"syscall" "syscall"
"time" "time"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
gcplib "github.com/litmuschaos/litmus-go/pkg/cloud/gcp" gcplib "github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-instance-stop/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1" "google.golang.org/api/compute/v1"
) )
@ -25,7 +30,9 @@ var (
) )
// PrepareVMStop contains the prepration and injection steps for the experiment // PrepareVMStop contains the prepration and injection steps for the experiment
func PrepareVMStop(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareVMStop(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareVMInstanceStopFault")
defer span.End()
// inject channel is used to transmit signal notifications. // inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
@ -53,15 +60,15 @@ func PrepareVMStop(computeService *compute.Service, experimentsDetails *experime
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(computeService, experimentsDetails, instanceNamesList, instanceZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInSerialMode(ctx, computeService, experimentsDetails, instanceNamesList, instanceZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(computeService, experimentsDetails, instanceNamesList, instanceZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInParallelMode(ctx, computeService, experimentsDetails, instanceNamesList, instanceZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
// wait for the ramp time after chaos injection // wait for the ramp time after chaos injection
@ -74,7 +81,9 @@ func PrepareVMStop(computeService *compute.Service, experimentsDetails *experime
} }
// injectChaosInSerialMode stops VM instances in serial mode i.e. one after the other // injectChaosInSerialMode stops VM instances in serial mode i.e. one after the other
func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, instanceZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, instanceZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMInstanceStopFaultInSerialMode")
defer span.End()
select { select {
case <-inject: case <-inject:
@ -101,7 +110,7 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Stopping the VM instance //Stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i]) log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil { if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("%s VM instance failed to stop, err: %v", instanceNamesList[i], err) return stacktrace.Propagate(err, "vm instance failed to stop")
} }
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails) common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
@ -109,13 +118,13 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Wait for VM instance to completely stop //Wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i]) log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil { if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("%s vm instance failed to fully shutdown, err: %v", instanceNamesList[i], err) return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
} }
// run the probes during chaos // run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration // the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 { if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -130,13 +139,13 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
// starting the VM instance // starting the VM instance
log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i]) log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil { if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("%s vm instance failed to start, err: %v", instanceNamesList[i], err) return stacktrace.Propagate(err, "vm instance failed to start")
} }
// wait for VM instance to get in running state // wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i]) log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil { if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("unable to start %s vm instance, err: %v", instanceNamesList[i], err) return stacktrace.Propagate(err, "unable to start vm instance")
} }
default: default:
@ -144,7 +153,7 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
// wait for VM instance to get in running state // wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i]) log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil { if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("unable to start %s vm instance, err: %v", instanceNamesList[i], err) return stacktrace.Propagate(err, "unable to start vm instance")
} }
} }
@ -159,7 +168,9 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
} }
// injectChaosInParallelMode stops VM instances in parallel mode i.e. all at once // injectChaosInParallelMode stops VM instances in parallel mode i.e. all at once
func injectChaosInParallelMode(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, instanceZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, instanceZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMInstanceStopFaultInParallelMode")
defer span.End()
select { select {
case <-inject: case <-inject:
@ -186,7 +197,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
// stopping the VM instance // stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i]) log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil { if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("%s vm instance failed to stop, err: %v", instanceNamesList[i], err) return stacktrace.Propagate(err, "vm instance failed to stop")
} }
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails) common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
@ -197,13 +208,13 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
// wait for VM instance to completely stop // wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i]) log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil { if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("%s vm instance failed to fully shutdown, err: %v", instanceNamesList[i], err) return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
} }
} }
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -219,7 +230,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
for i := range instanceNamesList { for i := range instanceNamesList {
log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i]) log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil { if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("%s vm instance failed to start, err: %v", instanceNamesList[i], err) return stacktrace.Propagate(err, "vm instance failed to start")
} }
} }
@ -228,7 +239,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i]) log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil { if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("unable to start %s vm instance, err: %v", instanceNamesList[i], err) return stacktrace.Propagate(err, "unable to start vm instance")
} }
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails) common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
@ -241,7 +252,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i]) log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil { if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("unable to start %s vm instance, err: %v", instanceNamesList[i], err) return stacktrace.Propagate(err, "unable to start vm instance")
} }
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails) common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
@ -267,20 +278,20 @@ func abortWatcher(computeService *compute.Service, experimentsDetails *experimen
instanceState, err := gcplib.GetVMInstanceStatus(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i]) instanceState, err := gcplib.GetVMInstanceStatus(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i])
if err != nil { if err != nil {
log.Errorf("failed to get %s vm instance status when an abort signal is received, err: %v", instanceNamesList[i], err) log.Errorf("Failed to get %s vm instance status when an abort signal is received, err: %v", instanceNamesList[i], err)
} }
if instanceState != "RUNNING" { if instanceState != "RUNNING" {
log.Infof("[Abort]: Waiting for %s VM instance to shut down", instanceNamesList[i]) log.Infof("[Abort]: Waiting for %s VM instance to shut down", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i]); err != nil { if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i]); err != nil {
log.Errorf("unable to wait till stop of the instance, err: %v", err) log.Errorf("Unable to wait till stop of %s instance, err: %v", instanceNamesList[i], err)
} }
log.Infof("[Abort]: Starting %s VM instance as abort signal is received", instanceNamesList[i]) log.Infof("[Abort]: Starting %s VM instance as abort signal is received", instanceNamesList[i])
err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i]) err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i])
if err != nil { if err != nil {
log.Errorf("%s vm instance failed to start when an abort signal is received, err: %v", instanceNamesList[i], err) log.Errorf("%s VM instance failed to start when an abort signal is received, err: %v", instanceNamesList[i], err)
} }
} }

View File

@ -1,12 +1,16 @@
package helper package helper
import ( import (
"bytes" "context"
"fmt" "fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"os" "os"
"os/exec"
"os/signal" "os/signal"
"strconv" "strconv"
"strings"
"syscall" "syscall"
"time" "time"
@ -17,7 +21,6 @@ import (
"github.com/litmuschaos/litmus-go/pkg/result" "github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
clientTypes "k8s.io/apimachinery/pkg/types" clientTypes "k8s.io/apimachinery/pkg/types"
) )
@ -27,7 +30,9 @@ var (
) )
// Helper injects the http chaos // Helper injects the http chaos
func Helper(clients clients.ClientSets) { func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodHTTPFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{} experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{} eventsDetails := types.EventDetails{}
@ -48,10 +53,11 @@ func Helper(clients clients.ClientSets) {
log.Info("[PreReq]: Getting the ENV variables") log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails) getENV(&experimentsDetails)
// Intialise the chaos attributes // Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails) types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters // Initialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails) types.SetResultAttributes(&resultDetails, chaosDetails)
// Set the chaos result uid // Set the chaos result uid
@ -59,22 +65,67 @@ func Helper(clients clients.ClientSets) {
err := prepareK8sHttpChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails) err := prepareK8sHttpChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails)
if err != nil { if err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err) log.Fatalf("helper pod failed, err: %v", err)
} }
} }
// prepareK8sHttpChaos contains the prepration steps before chaos injection // prepareK8sHttpChaos contains the preparation steps before chaos injection
func prepareK8sHttpChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error { func prepareK8sHttpChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
containerID, err := common.GetRuntimeBasedContainerID(experimentsDetails.ContainerRuntime, experimentsDetails.SocketPath, experimentsDetails.TargetPods, experimentsDetails.AppNS, experimentsDetails.TargetContainer, clients) targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not parse targets")
} }
// extract out the pid of the target container
targetPID, err := common.GetPauseAndSandboxPID(experimentsDetails.ContainerRuntime, containerID, experimentsDetails.SocketPath) var targets []targetDetails
if err != nil {
return err for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
td.ContainerId, err = common.GetRuntimeBasedContainerID(experimentsDetails.ContainerRuntime, experimentsDetails.SocketPath, td.Name, td.Namespace, td.TargetContainer, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the pid of the target container
td.Pid, err = common.GetPauseAndSandboxPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
}
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos
go abortWatcher(targets, resultDetails.Name, chaosDetails.ChaosNamespace, experimentsDetails)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
for _, t := range targets {
// injecting http chaos inside target container
if err = injectChaos(experimentsDetails, t); err != nil {
return stacktrace.Propagate(err, "could not inject chaos")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertChaos(experimentsDetails, t); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
} }
// record the event inside chaosengine // record the event inside chaosengine
@ -84,71 +135,67 @@ func prepareK8sHttpChaos(experimentsDetails *experimentTypes.ExperimentDetails,
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine") events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
} }
// watching for the abort signal and revert the chaos
go abortWatcher(targetPID, resultDetails.Name, chaosDetails.ChaosNamespace, experimentsDetails)
// injecting http chaos inside target container
if err = injectChaos(experimentsDetails, targetPID); err != nil {
return err
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration) log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration) common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Stopping the experiment") log.Info("[Chaos]: chaos duration is over, reverting chaos")
// cleaning the netem process after chaos injection var errList []string
if err = revertChaos(experimentsDetails, targetPID); err != nil { for _, t := range targets {
return err // cleaning the ip rules process after chaos injection
err := revertChaos(experimentsDetails, t)
if err != nil {
errList = append(errList, err.Error())
continue
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
} }
return result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods) if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
} }
// injectChaos inject the http chaos in target container and add ruleset to the iptables to redirect the ports // injectChaos inject the http chaos in target container and add ruleset to the iptables to redirect the ports
func injectChaos(experimentDetails *experimentTypes.ExperimentDetails, pid int) error { func injectChaos(experimentDetails *experimentTypes.ExperimentDetails, t targetDetails) error {
if err := startProxy(experimentDetails, t.Pid); err != nil {
select { killErr := killProxy(t.Pid, t.Source)
case <-inject: if killErr != nil {
// stopping the chaos execution, if abort signal received return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(killErr).Error())}
os.Exit(1)
default:
// proceed for chaos injection
if err := startProxy(experimentDetails, pid); err != nil {
_ = killProxy(pid)
return errors.Errorf("failed to start proxy, err: %v", err)
} }
if err := addIPRuleSet(experimentDetails, pid); err != nil { return stacktrace.Propagate(err, "could not start proxy server")
_ = killProxy(pid) }
return errors.Errorf("failed to add ip rule set, err: %v", err) if err := addIPRuleSet(experimentDetails, t.Pid); err != nil {
killErr := killProxy(t.Pid, t.Source)
if killErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(killErr).Error())}
} }
return stacktrace.Propagate(err, "could not add ip rules")
} }
return nil return nil
} }
// revertChaos revert the http chaos in target container // revertChaos revert the http chaos in target container
func revertChaos(experimentDetails *experimentTypes.ExperimentDetails, pid int) error { func revertChaos(experimentDetails *experimentTypes.ExperimentDetails, t targetDetails) error {
var revertError error var errList []string
revertError = nil
if err := removeIPRuleSet(experimentDetails, pid); err != nil { if err := removeIPRuleSet(experimentDetails, t.Pid); err != nil {
revertError = errors.Errorf("failed to remove ip rule set, err: %v", err) errList = append(errList, err.Error())
} }
if err := killProxy(pid); err != nil { if err := killProxy(t.Pid, t.Source); err != nil {
if revertError != nil { errList = append(errList, err.Error())
revertError = errors.Errorf("%v and failed to kill proxy server, err: %v", revertError, err)
} else {
revertError = errors.Errorf("failed to kill proxy server, err: %v", err)
}
} }
return revertError if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
return nil
} }
// startProxy starts the proxy process inside the target container // startProxy starts the proxy process inside the target container
@ -169,7 +216,7 @@ func startProxy(experimentDetails *experimentTypes.ExperimentDetails, pid int) e
log.Infof("[Chaos]: Starting proxy server") log.Infof("[Chaos]: Starting proxy server")
if err := runCommand(chaosCommand); err != nil { if err := common.RunBashCommand(chaosCommand, "failed to start proxy server", experimentDetails.ChaosPodName); err != nil {
return err return err
} }
@ -177,14 +224,16 @@ func startProxy(experimentDetails *experimentTypes.ExperimentDetails, pid int) e
return nil return nil
} }
const NoProxyToKill = "you need to specify whom to kill"
// killProxy kills the proxy process inside the target container // killProxy kills the proxy process inside the target container
// it is using nsenter command to enter into network namespace of target container // it is using nsenter command to enter into network namespace of target container
// and execute the proxy related command inside it. // and execute the proxy related command inside it.
func killProxy(pid int) error { func killProxy(pid int, source string) error {
stopProxyServerCommand := fmt.Sprintf("sudo nsenter -t %d -n sudo kill -9 $(ps aux | grep [t]oxiproxy | awk 'FNR==1{print $1}')", pid) stopProxyServerCommand := fmt.Sprintf("sudo nsenter -t %d -n sudo kill -9 $(ps aux | grep [t]oxiproxy | awk 'FNR==2{print $2}')", pid)
log.Infof("[Chaos]: Stopping proxy server") log.Infof("[Chaos]: Stopping proxy server")
if err := runCommand(stopProxyServerCommand); err != nil { if err := common.RunBashCommand(stopProxyServerCommand, "failed to stop proxy server", source); err != nil {
return err return err
} }
@ -202,7 +251,7 @@ func addIPRuleSet(experimentDetails *experimentTypes.ExperimentDetails, pid int)
addIPRuleSetCommand := fmt.Sprintf("(sudo nsenter -t %d -n iptables -t nat -I PREROUTING -i %v -p tcp --dport %d -j REDIRECT --to-port %d)", pid, experimentDetails.NetworkInterface, experimentDetails.TargetServicePort, experimentDetails.ProxyPort) addIPRuleSetCommand := fmt.Sprintf("(sudo nsenter -t %d -n iptables -t nat -I PREROUTING -i %v -p tcp --dport %d -j REDIRECT --to-port %d)", pid, experimentDetails.NetworkInterface, experimentDetails.TargetServicePort, experimentDetails.ProxyPort)
log.Infof("[Chaos]: Adding IPtables ruleset") log.Infof("[Chaos]: Adding IPtables ruleset")
if err := runCommand(addIPRuleSetCommand); err != nil { if err := common.RunBashCommand(addIPRuleSetCommand, "failed to add ip rules", experimentDetails.ChaosPodName); err != nil {
return err return err
} }
@ -210,6 +259,8 @@ func addIPRuleSet(experimentDetails *experimentTypes.ExperimentDetails, pid int)
return nil return nil
} }
const NoIPRulesetToRemove = "No chain/target/match by that name"
// removeIPRuleSet removes the ip rule set from iptables in target container // removeIPRuleSet removes the ip rule set from iptables in target container
// it is using nsenter command to enter into network namespace of target container // it is using nsenter command to enter into network namespace of target container
// and execute the iptables related command inside it. // and execute the iptables related command inside it.
@ -217,7 +268,7 @@ func removeIPRuleSet(experimentDetails *experimentTypes.ExperimentDetails, pid i
removeIPRuleSetCommand := fmt.Sprintf("sudo nsenter -t %d -n iptables -t nat -D PREROUTING -i %v -p tcp --dport %d -j REDIRECT --to-port %d", pid, experimentDetails.NetworkInterface, experimentDetails.TargetServicePort, experimentDetails.ProxyPort) removeIPRuleSetCommand := fmt.Sprintf("sudo nsenter -t %d -n iptables -t nat -D PREROUTING -i %v -p tcp --dport %d -j REDIRECT --to-port %d", pid, experimentDetails.NetworkInterface, experimentDetails.TargetServicePort, experimentDetails.ProxyPort)
log.Infof("[Chaos]: Removing IPtables ruleset") log.Infof("[Chaos]: Removing IPtables ruleset")
if err := runCommand(removeIPRuleSetCommand); err != nil { if err := common.RunBashCommand(removeIPRuleSetCommand, "failed to remove ip rules", experimentDetails.ChaosPodName); err != nil {
return err return err
} }
@ -229,10 +280,6 @@ func removeIPRuleSet(experimentDetails *experimentTypes.ExperimentDetails, pid i
func getENV(experimentDetails *experimentTypes.ExperimentDetails) { func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "") experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "") experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.AppLabel = types.Getenv("APP_LABEL", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "")) experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", ""))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus") experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "") experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
@ -246,27 +293,8 @@ func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.Toxicity, _ = strconv.Atoi(types.Getenv("TOXICITY", "100")) experimentDetails.Toxicity, _ = strconv.Atoi(types.Getenv("TOXICITY", "100"))
} }
func runCommand(chaosCommand string) error {
var stdout, stderr bytes.Buffer
cmd := exec.Command("/bin/bash", "-c", chaosCommand)
cmd.Stdout = &stdout
cmd.Stderr = &stderr
err = cmd.Run()
errStr := stderr.String()
if err != nil {
// if we get standard error then, return the same
if errStr != "" {
return errors.New(errStr)
}
// if not standard error found, return error
return err
}
return nil
}
// abortWatcher continuously watch for the abort signals // abortWatcher continuously watch for the abort signals
func abortWatcher(targetPID int, resultName, chaosNS string, experimentDetails *experimentTypes.ExperimentDetails) { func abortWatcher(targets []targetDetails, resultName, chaosNS string, experimentDetails *experimentTypes.ExperimentDetails) {
<-abort <-abort
log.Info("[Abort]: Killing process started because of terminated signal received") log.Info("[Abort]: Killing process started because of terminated signal received")
@ -274,23 +302,31 @@ func abortWatcher(targetPID int, resultName, chaosNS string, experimentDetails *
retry := 3 retry := 3
for retry > 0 { for retry > 0 {
if err = revertChaos(experimentDetails, targetPID); err != nil { for _, t := range targets {
retry-- if err = revertChaos(experimentDetails, t); err != nil {
// If retries are left if strings.Contains(err.Error(), NoIPRulesetToRemove) && strings.Contains(err.Error(), NoProxyToKill) {
if retry > 0 { continue
log.Errorf("[Abort]: Failed to revert chaos, retrying %d more times, err: %v", retry, err) }
time.Sleep(1 * time.Second) log.Errorf("unable to revert for %v pod, err :%v", t.Name, err)
continue continue
} }
// else exit with error if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("[Abort]: Chaos Revert Failed") log.Errorf("unable to annotate the chaosresult for %v pod, err :%v", t.Name, err)
os.Exit(1) }
} }
retry--
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", experimentDetails.TargetPods); err != nil { time.Sleep(1 * time.Second)
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
} }
log.Info("Chaos Revert Completed")
os.Exit(1)
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
ContainerId string
Pid int
Source string
} }

View File

@ -1,16 +1,22 @@
package header package header
import ( import (
"context"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib" http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
) )
//PodHttpModifyHeaderChaos contains the steps to prepare and inject http modify header chaos // PodHttpModifyHeaderChaos contains the steps to prepare and inject http modify header chaos
func PodHttpModifyHeaderChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PodHttpModifyHeaderChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHTTPModifyHeaderFault")
defer span.End()
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{ log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort, "Target Port": experimentsDetails.TargetServicePort,
@ -27,5 +33,5 @@ func PodHttpModifyHeaderChaos(experimentsDetails *experimentTypes.ExperimentDeta
stream = "upstream" stream = "upstream"
} }
args := "-t header --" + stream + " -a headers='" + (experimentsDetails.HeadersMap) + "' -a mode=" + experimentsDetails.HeaderMode args := "-t header --" + stream + " -a headers='" + (experimentsDetails.HeadersMap) + "' -a mode=" + experimentsDetails.HeaderMode
return http_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args) return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
} }

View File

@ -2,64 +2,45 @@ package lib
import ( import (
"context" "context"
"fmt"
"os"
"strconv" "strconv"
"strings" "strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1" apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
) )
//PrepareAndInjectChaos contains the preparation & injection steps // PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error { func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error {
targetPodList := apiv1.PodList{}
var err error var err error
var podsAffectedPerc int
// Get the target pod details for the chaos execution // Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage // if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" { if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
} }
//set up the tunables if provided in range //set up the tunables if provided in range
SetChaosTunables(experimentsDetails) SetChaosTunables(experimentsDetails)
podsAffectedPerc, _ = strconv.Atoi(experimentsDetails.PodsAffectedPerc) targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if experimentsDetails.NodeLabel == "" { if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
} else {
if experimentsDetails.TargetPods == "" {
targetPodList, err = common.GetPodListFromSpecifiedNodes(experimentsDetails.TargetPods, podsAffectedPerc, experimentsDetails.NodeLabel, clients, chaosDetails)
if err != nil {
return err
}
} else {
log.Infof("TARGET_PODS env is provided, overriding the NODE_LABEL input")
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
}
} }
var podNames []string
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("[Info]: Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection //Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime) log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
@ -70,42 +51,42 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
if experimentsDetails.ChaosServiceAccount == "" { if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients) experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil { if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err) return stacktrace.Propagate(err, "could not get experiment service account")
} }
} }
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil { if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err return stacktrace.Propagate(err, "could not set helper data")
} }
} }
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "") experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, args, clients, chaosDetails, resultDetails, eventsDetails); err != nil { if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, args, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, args, clients, chaosDetails, resultDetails, eventsDetails); err != nil { if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, args, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
return nil return nil
} }
// injectChaosInSerialMode inject the http chaos in all target application serially (one by one) // injectChaosInSerialMode inject the http chaos in all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, args string, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, args string, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodHTTPFaultInSerialMode")
defer span.End()
var err error
labelSuffix := common.GetRunID()
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -115,10 +96,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get the target container name of the application pod //Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided { if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{ log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
@ -126,33 +104,16 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"NodeName": pod.Spec.NodeName, "NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer, "ContainerName": experimentsDetails.TargetContainer,
}) })
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, args, labelSuffix); err != nil { runID := stringutils.GetRunID()
return errors.Errorf("unable to create the helper pod, err: %v", err) if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
} }
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
log.Info("[Status]: Checking the status of the helper pods") return err
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for http chaos
log.Info("[Cleanup]: Deleting the the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pods, err: %v", err)
} }
} }
@ -160,79 +121,54 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
// injectChaosInParallelMode inject the http chaos in all target application in parallel mode (all at once) // injectChaosInParallelMode inject the http chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, args string, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, args string, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodHTTPFaultInParallelMode")
defer span.End()
labelSuffix := common.GetRunID()
var err error
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
// creating the helper pod to perform http chaos runID := stringutils.GetRunID()
for _, pod := range targetPodList.Items { targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
//Get the target container name of the application pod for node, tar := range targets {
//It checks the empty target container for the first iteration only var targetsPerNode []string
if !experimentsDetails.IsTargetContainerProvided { for _, k := range tar.Target {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{ if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID, args); err != nil {
"PodName": pod.Name, return stacktrace.Propagate(err, "could not create helper pod")
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, args, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
} }
} }
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
log.Info("[Status]: Checking the status of the helper pods") return err
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
// Deleting all the helper pod for http chaos
log.Info("[Cleanup]: Deleting all the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pods, err: %v", err)
} }
return nil return nil
} }
// createHelperPod derive the attributes for helper pod and create the helper pod // createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, podName, nodeName, runID, args, labelSuffix string) error { func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID, args string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateHTTPChaosHelperPod")
defer span.End()
privilegedEnable := true privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds) terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{ helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{ ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID, GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace, Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName), Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations, Annotations: chaosDetails.Annotations,
}, },
Spec: apiv1.PodSpec{ Spec: apiv1.PodSpec{
HostPID: true, HostPID: true,
@ -265,7 +201,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name http-chaos", "./helpers -name http-chaos",
}, },
Resources: chaosDetails.Resources, Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, podName, args), Env: getPodEnv(ctx, experimentsDetails, targets, args),
VolumeMounts: []apiv1.VolumeMount{ VolumeMounts: []apiv1.VolumeMount{
{ {
Name: "cri-socket", Name: "cri-socket",
@ -286,18 +222,23 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
}, },
} }
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{}) if len(chaosDetails.SideCar) != 0 {
return err helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
} }
// getPodEnv derive all the env required for the helper pod // getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName, args string) []apiv1.EnvVar { func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets, args string) []apiv1.EnvVar {
var envDetails common.ENVDetails var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS). envDetails.SetEnv("TARGETS", targets).
SetEnv("APP_POD", podName).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)). SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace). SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName). SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
@ -310,13 +251,15 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName, a
SetEnv("TARGET_SERVICE_PORT", strconv.Itoa(experimentsDetails.TargetServicePort)). SetEnv("TARGET_SERVICE_PORT", strconv.Itoa(experimentsDetails.TargetServicePort)).
SetEnv("PROXY_PORT", strconv.Itoa(experimentsDetails.ProxyPort)). SetEnv("PROXY_PORT", strconv.Itoa(experimentsDetails.ProxyPort)).
SetEnv("TOXICITY", strconv.Itoa(experimentsDetails.Toxicity)). SetEnv("TOXICITY", strconv.Itoa(experimentsDetails.Toxicity)).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name") SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV return envDetails.ENV
} }
//SetChaosTunables will setup a random value within a given range of values // SetChaosTunables will set up a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value. // If the value is not provided in range it'll set up the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) { func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc) experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence) experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)

View File

@ -1,18 +1,23 @@
package latency package latency
import ( import (
"context"
"strconv" "strconv"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib" http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
) )
//PodHttpLatencyChaos contains the steps to prepare and inject http latency chaos // PodHttpLatencyChaos contains the steps to prepare and inject http latency chaos
func PodHttpLatencyChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PodHttpLatencyChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHttpLatencyFault")
defer span.End()
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{ log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort, "Target Port": experimentsDetails.TargetServicePort,
@ -24,5 +29,5 @@ func PodHttpLatencyChaos(experimentsDetails *experimentTypes.ExperimentDetails,
}) })
args := "-t latency -a latency=" + strconv.Itoa(experimentsDetails.Latency) args := "-t latency -a latency=" + strconv.Itoa(experimentsDetails.Latency)
return http_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args) return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
} }

View File

@ -1,20 +1,25 @@
package modifybody package modifybody
import ( import (
"context"
"fmt" "fmt"
"math" "math"
"strings" "strings"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib" http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
) )
// PodHttpModifyBodyChaos contains the steps to prepare and inject http modify body chaos // PodHttpModifyBodyChaos contains the steps to prepare and inject http modify body chaos
func PodHttpModifyBodyChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PodHttpModifyBodyChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHTTPModifyBodyFault")
defer span.End()
// responseBodyMaxLength defines the max length of response body string to be printed. It is taken as // responseBodyMaxLength defines the max length of response body string to be printed. It is taken as
// the min of length of body and 120 characters to avoid printing large response body. // the min of length of body and 120 characters to avoid printing large response body.
@ -34,7 +39,7 @@ func PodHttpModifyBodyChaos(experimentsDetails *experimentTypes.ExperimentDetail
args := fmt.Sprintf( args := fmt.Sprintf(
`-t modify_body -a body="%v" -a content_type=%v -a content_encoding=%v`, `-t modify_body -a body="%v" -a content_type=%v -a content_encoding=%v`,
EscapeQuotes(experimentsDetails.ResponseBody), experimentsDetails.ContentType, experimentsDetails.ContentEncoding) EscapeQuotes(experimentsDetails.ResponseBody), experimentsDetails.ContentType, experimentsDetails.ContentEncoding)
return http_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args) return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
} }
// EscapeQuotes escapes the quotes in the given string // EscapeQuotes escapes the quotes in the given string

View File

@ -1,18 +1,23 @@
package reset package reset
import ( import (
"context"
"strconv" "strconv"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib" http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
) )
//PodHttpResetPeerChaos contains the steps to prepare and inject http reset peer chaos // PodHttpResetPeerChaos contains the steps to prepare and inject http reset peer chaos
func PodHttpResetPeerChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PodHttpResetPeerChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHTTPResetPeerFault")
defer span.End()
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{ log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort, "Target Port": experimentsDetails.TargetServicePort,
@ -24,5 +29,5 @@ func PodHttpResetPeerChaos(experimentsDetails *experimentTypes.ExperimentDetails
}) })
args := "-t reset_peer -a timeout=" + strconv.Itoa(experimentsDetails.ResetTimeout) args := "-t reset_peer -a timeout=" + strconv.Itoa(experimentsDetails.ResetTimeout)
return http_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args) return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
} }

View File

@ -1,6 +1,7 @@
package statuscode package statuscode
import ( import (
"context"
"fmt" "fmt"
"math" "math"
"math/rand" "math/rand"
@ -8,13 +9,16 @@ import (
"strings" "strings"
"time" "time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"go.opentelemetry.io/otel"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib" http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
body "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib/modify-body" body "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib/modify-body"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/pkg/errors"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
) )
@ -26,7 +30,9 @@ var acceptedStatusCodes = []string{
} }
// PodHttpStatusCodeChaos contains the steps to prepare and inject http status code chaos // PodHttpStatusCodeChaos contains the steps to prepare and inject http status code chaos
func PodHttpStatusCodeChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PodHttpStatusCodeChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHttpStatusCodeFault")
defer span.End()
// responseBodyMaxLength defines the max length of response body string to be printed. It is taken as // responseBodyMaxLength defines the max length of response body string to be printed. It is taken as
// the min of length of body and 120 characters to avoid printing large response body. // the min of length of body and 120 characters to avoid printing large response body.
@ -49,7 +55,7 @@ func PodHttpStatusCodeChaos(experimentsDetails *experimentTypes.ExperimentDetail
`-t status_code -a status_code=%s -a modify_response_body=%d -a response_body="%v" -a content_type=%s -a content_encoding=%s`, `-t status_code -a status_code=%s -a modify_response_body=%d -a response_body="%v" -a content_type=%s -a content_encoding=%s`,
experimentsDetails.StatusCode, stringBoolToInt(experimentsDetails.ModifyResponseBody), body.EscapeQuotes(experimentsDetails.ResponseBody), experimentsDetails.StatusCode, stringBoolToInt(experimentsDetails.ModifyResponseBody), body.EscapeQuotes(experimentsDetails.ResponseBody),
experimentsDetails.ContentType, experimentsDetails.ContentEncoding) experimentsDetails.ContentType, experimentsDetails.ContentEncoding)
return http_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args) return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
} }
// GetStatusCode performs two functions: // GetStatusCode performs two functions:
@ -71,11 +77,11 @@ func GetStatusCode(statusCode string) (string, error) {
} else { } else {
acceptedCodes := getAcceptedCodesInList(statusCodeList, acceptedStatusCodes) acceptedCodes := getAcceptedCodesInList(statusCodeList, acceptedStatusCodes)
if len(acceptedCodes) == 0 { if len(acceptedCodes) == 0 {
return "", errors.Errorf("invalid status code provided, code: %s", statusCode) return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("invalid status code: %s", statusCode)}
} }
return acceptedCodes[rand.Intn(len(acceptedCodes))], nil return acceptedCodes[rand.Intn(len(acceptedCodes))], nil
} }
return "", errors.Errorf("status code %s is not supported. \nList of supported status codes: %v", statusCode, acceptedStatusCodes) return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("status code '%s' is not supported. Supported status codes are: %v", statusCode, acceptedStatusCodes)}
} }
// getAcceptedCodesInList returns the list of accepted status codes from a list of status codes // getAcceptedCodesInList returns the list of accepted status codes from a list of status codes

View File

@ -0,0 +1,165 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/load/k6-loadgen/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
corev1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectK6LoadGenFault")
defer span.End()
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
runID := stringutils.GetRunID()
// creating the helper pod to perform k6-loadgen chaos
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// PrepareChaos contains the preparation steps before chaos injection
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareK6LoadGenFault")
defer span.End()
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Starting the k6-loadgen experiment
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not execute chaos")
}
// Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateK6LoadGenFaultHelperPod")
defer span.End()
const volumeName = "script-volume"
const mountPath = "/mnt"
var envs []corev1.EnvVar
args := []string{
mountPath + "/" + experimentsDetails.ScriptSecretKey,
"-q",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--tag",
"trace_id=" + span.SpanContext().TraceID().String(),
}
if otelExporterEndpoint := os.Getenv(telemetry.OTELExporterOTLPEndpoint); otelExporterEndpoint != "" {
envs = []corev1.EnvVar{
{
Name: "K6_OTEL_METRIC_PREFIX",
Value: experimentsDetails.OTELMetricPrefix,
},
{
Name: "K6_OTEL_GRPC_EXPORTER_INSECURE",
Value: "true",
},
{
Name: "K6_OTEL_GRPC_EXPORTER_ENDPOINT",
Value: otelExporterEndpoint,
},
}
args = append(args, "--out", "experimental-opentelemetry")
}
helperPod := &corev1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: corev1.PodSpec{
RestartPolicy: corev1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
Containers: []corev1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: corev1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"k6",
"run",
},
Args: args,
Env: envs,
Resources: chaosDetails.Resources,
VolumeMounts: []corev1.VolumeMount{
{
Name: volumeName,
MountPath: mountPath,
},
},
},
},
Volumes: []corev1.Volume{
{
Name: volumeName,
VolumeSource: corev1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
SecretName: experimentsDetails.ScriptSecretName,
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}

View File

@ -2,26 +2,33 @@ package lib
import ( import (
"context" "context"
"fmt"
"strconv" "strconv"
"strings" "strings"
"time" "time"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/workloads"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kafka/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/kafka/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status" "github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/annotation"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
) )
//PreparePodDelete contains the prepration steps before chaos injection // PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PreparePodDelete(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareKafkaPodDeleteFault")
defer span.End()
//Waiting for the ramp time before chaos injection //Waiting for the ramp time before chaos injection
if experimentsDetails.ChaoslibDetail.RampTime != 0 { if experimentsDetails.ChaoslibDetail.RampTime != 0 {
@ -31,15 +38,15 @@ func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, cli
switch strings.ToLower(experimentsDetails.ChaoslibDetail.Sequence) { switch strings.ToLower(experimentsDetails.ChaoslibDetail.Sequence) {
case "serial": case "serial":
if err := injectChaosInSerialMode(experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil { if err := injectChaosInSerialMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err := injectChaosInParallelMode(experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil { if err := injectChaosInParallelMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.ChaoslibDetail.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.ChaoslibDetail.Sequence)}
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -51,11 +58,12 @@ func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, cli
} }
// injectChaosInSerialMode delete the kafka broker pods in serial mode(one by one) // injectChaosInSerialMode delete the kafka broker pods in serial mode(one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectKafkaPodDeleteFaultInSerialMode")
defer span.End()
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -68,9 +76,10 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
for duration < experimentsDetails.ChaoslibDetail.ChaosDuration { for duration < experimentsDetails.ChaoslibDetail.ChaosDuration {
// Get the target pod details for the chaos execution // Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage // if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.KafkaBroker == "" && chaosDetails.AppDetail.Label == "" { if experimentsDetails.KafkaBroker == "" && chaosDetails.AppDetail == nil {
return errors.Errorf("please provide one of the appLabel or KAFKA_BROKER") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or KAFKA_BROKER"}
} }
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.ChaoslibDetail.PodsAffectedPerc) podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.ChaoslibDetail.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.KafkaBroker, podsAffectedPerc, clients, chaosDetails) targetPodList, err := common.GetPodList(experimentsDetails.KafkaBroker, podsAffectedPerc, clients, chaosDetails)
if err != nil { if err != nil {
@ -78,17 +87,15 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
// deriving the parent name of the target resources // deriving the parent name of the target resources
if chaosDetails.AppDetail.Kind != "" { for _, pod := range targetPodList.Items {
for _, pod := range targetPodList.Items { kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
parentName, err := annotation.GetParentName(clients, pod, chaosDetails) if err != nil {
if err != nil { return err
return err
}
common.SetParentName(parentName, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target, "targeted", chaosDetails.AppDetail.Kind, chaosDetails)
} }
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
} }
if experimentsDetails.ChaoslibDetail.EngineName != "" { if experimentsDetails.ChaoslibDetail.EngineName != "" {
@ -104,18 +111,18 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"PodName": pod.Name}) "PodName": pod.Name})
if experimentsDetails.ChaoslibDetail.Force { if experimentsDetails.ChaoslibDetail.Force {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaoslibDetail.AppNS).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod}) err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else { } else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaoslibDetail.AppNS).Delete(context.Background(), pod.Name, v1.DeleteOptions{}) err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
} }
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
} }
switch chaosDetails.Randomness { switch chaosDetails.Randomness {
case true: case true:
if err := common.RandomInterval(experimentsDetails.ChaoslibDetail.ChaosInterval); err != nil { if err := common.RandomInterval(experimentsDetails.ChaoslibDetail.ChaosInterval); err != nil {
return err return stacktrace.Propagate(err, "could not get random chaos interval")
} }
default: default:
//Waiting for the chaos interval after chaos injection //Waiting for the chaos interval after chaos injection
@ -128,8 +135,15 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Verify the status of pod after the chaos injection //Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod") log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.ChaoslibDetail.AppNS, experimentsDetails.ChaoslibDetail.AppLabel, experimentsDetails.ChaoslibDetail.Timeout, experimentsDetails.ChaoslibDetail.Delay, clients); err != nil { for _, parent := range chaosDetails.ParentsResources {
return err target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.ChaoslibDetail.Timeout, experimentsDetails.ChaoslibDetail.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
} }
} }
duration = int(time.Since(ChaosStartTimeStamp).Seconds()) duration = int(time.Since(ChaosStartTimeStamp).Seconds())
@ -140,11 +154,12 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
// injectChaosInParallelMode delete the kafka broker pods in parallel mode (all at once) // injectChaosInParallelMode delete the kafka broker pods in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectKafkaPodDeleteFaultInParallelMode")
defer span.End()
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -157,27 +172,25 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for duration < experimentsDetails.ChaoslibDetail.ChaosDuration { for duration < experimentsDetails.ChaoslibDetail.ChaosDuration {
// Get the target pod details for the chaos execution // Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage // if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.KafkaBroker == "" && chaosDetails.AppDetail.Label == "" { if experimentsDetails.KafkaBroker == "" && chaosDetails.AppDetail == nil {
return errors.Errorf("please provide one of the appLabel or KAFKA_BROKER") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or KAFKA_BROKER"}
} }
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.ChaoslibDetail.PodsAffectedPerc) podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.ChaoslibDetail.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.KafkaBroker, podsAffectedPerc, clients, chaosDetails) targetPodList, err := common.GetPodList(experimentsDetails.KafkaBroker, podsAffectedPerc, clients, chaosDetails)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get target pods")
} }
// deriving the parent name of the target resources // deriving the parent name of the target resources
if chaosDetails.AppDetail.Kind != "" { for _, pod := range targetPodList.Items {
for _, pod := range targetPodList.Items { kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
parentName, err := annotation.GetParentName(clients, pod, chaosDetails) if err != nil {
if err != nil { return stacktrace.Propagate(err, "could not get pod owner name and kind")
return err
}
common.SetParentName(parentName, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target, "targeted", chaosDetails.AppDetail.Kind, chaosDetails)
} }
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
} }
if experimentsDetails.ChaoslibDetail.EngineName != "" { if experimentsDetails.ChaoslibDetail.EngineName != "" {
@ -193,19 +206,19 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"PodName": pod.Name}) "PodName": pod.Name})
if experimentsDetails.ChaoslibDetail.Force { if experimentsDetails.ChaoslibDetail.Force {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaoslibDetail.AppNS).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod}) err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else { } else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaoslibDetail.AppNS).Delete(context.Background(), pod.Name, v1.DeleteOptions{}) err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
} }
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
} }
} }
switch chaosDetails.Randomness { switch chaosDetails.Randomness {
case true: case true:
if err := common.RandomInterval(experimentsDetails.ChaoslibDetail.ChaosInterval); err != nil { if err := common.RandomInterval(experimentsDetails.ChaoslibDetail.ChaosInterval); err != nil {
return err return stacktrace.Propagate(err, "could not get random chaos interval")
} }
default: default:
//Waiting for the chaos interval after chaos injection //Waiting for the chaos interval after chaos injection
@ -218,8 +231,15 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Verify the status of pod after the chaos injection //Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod") log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.ChaoslibDetail.AppNS, experimentsDetails.ChaoslibDetail.AppLabel, experimentsDetails.ChaoslibDetail.Timeout, experimentsDetails.ChaoslibDetail.Delay, clients); err != nil { for _, parent := range chaosDetails.ParentsResources {
return err target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.ChaoslibDetail.Timeout, experimentsDetails.ChaoslibDetail.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
} }
duration = int(time.Since(ChaosStartTimeStamp).Seconds()) duration = int(time.Since(ChaosStartTimeStamp).Seconds())

View File

@ -2,31 +2,38 @@ package lib
import ( import (
"context" "context"
"fmt"
"strconv" "strconv"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/kubelet-service-kill/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/kubelet-service-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status" "github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1" apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
) )
// PrepareKubeletKill contains prepration steps before chaos injection // PrepareKubeletKill contains prepration steps before chaos injection
func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareKubeletKill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareKubeletServiceKillFault")
defer span.End()
var err error var err error
if experimentsDetails.TargetNode == "" { if experimentsDetails.TargetNode == "" {
//Select node for kubelet-service-kill //Select node for kubelet-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients) experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get node name")
} }
} }
@ -34,7 +41,7 @@ func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, c
"NodeName": experimentsDetails.TargetNode, "NodeName": experimentsDetails.TargetNode,
}) })
experimentsDetails.RunID = common.GetRunID() experimentsDetails.RunID = stringutils.GetRunID()
//Waiting for the ramp time before chaos injection //Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
@ -50,54 +57,33 @@ func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, c
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil { if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err return stacktrace.Propagate(err, "could not set helper data")
} }
} }
// Creating the helper pod to perform node memory hog // Creating the helper pod to perform node memory hog
if err = createHelperPod(experimentsDetails, clients, chaosDetails, experimentsDetails.TargetNode); err != nil { if err = createHelperPod(ctx, experimentsDetails, clients, chaosDetails, experimentsDetails.TargetNode); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err) return stacktrace.Propagate(err, "could not create helper pod")
} }
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod //Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod") if err := common.CheckHelperStatusAndRunProbes(ctx, appLabel, experimentsDetails.TargetNode, chaosDetails, clients, resultDetails, eventsDetails); err != nil {
if err = status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil { return err
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(experimentsDetails.TargetNode, "targeted", "node", chaosDetails)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return err
}
} }
// Checking for the node to be in not-ready state // Checking for the node to be in not-ready state
log.Info("[Status]: Check for the node to be in NotReady state") log.Info("[Status]: Check for the node to be in NotReady state")
if err = status.CheckNodeNotReadyState(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil { if err = status.CheckNodeNotReadyState(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients) if deleteErr := common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients); deleteErr != nil {
return errors.Errorf("application node is not in NotReady state, err: %v", err) return cerrors.PreserveError{ErrString: fmt.Sprintf("[err: %v, delete error: %v]", err, deleteErr)}
}
return stacktrace.Propagate(err, "could not check for NOT READY state")
} }
// Wait till the completion of helper pod if err := common.WaitForCompletionAndDeleteHelperPods(appLabel, chaosDetails, clients, false); err != nil {
log.Info("[Wait]: Waiting till the completion of the helper pod") return err
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -105,11 +91,14 @@ func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, c
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime) log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime) common.WaitForDuration(experimentsDetails.RampTime)
} }
return nil return nil
} }
// createHelperPod derive the attributes for helper pod and create the helper pod // createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName string) error { func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateKubeletServiceKillFaultHelperPod")
defer span.End()
privileged := true privileged := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds) terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
@ -118,7 +107,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
ObjectMeta: v1.ObjectMeta{ ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID, Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace, Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, "", experimentsDetails.ExperimentName), Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations, Annotations: chaosDetails.Annotations,
}, },
Spec: apiv1.PodSpec{ Spec: apiv1.PodSpec{
@ -190,8 +179,16 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
}, },
} }
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{}) if len(chaosDetails.SideCar) != 0 {
return err helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
} }
func ptrint64(p int64) *int64 { func ptrint64(p int64) *int64 {

View File

@ -1,6 +1,7 @@
package helper package helper
import ( import (
"context"
"fmt" "fmt"
"os" "os"
"os/exec" "os/exec"
@ -10,8 +11,13 @@ import (
"syscall" "syscall"
"time" "time"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result" "github.com/litmuschaos/litmus-go/pkg/result"
@ -26,14 +32,15 @@ const (
) )
var ( var (
err error err error
inject, abort chan os.Signal inject, abort chan os.Signal
sPorts, dPorts, whitelistDPorts, whitelistSPorts []string
) )
var destIps, sPorts, dPorts []string
// Helper injects the network chaos // Helper injects the network chaos
func Helper(clients clients.ClientSets) { func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodNetworkFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{} experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{} eventsDetails := types.EventDetails{}
@ -54,10 +61,11 @@ func Helper(clients clients.ClientSets) {
log.Info("[PreReq]: Getting the ENV variables") log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails) getENV(&experimentsDetails)
// Intialise the chaos attributes // Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails) types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters // Initialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails) types.SetResultAttributes(&resultDetails, chaosDetails)
// Set the chaos result uid // Set the chaos result uid
@ -65,213 +73,304 @@ func Helper(clients clients.ClientSets) {
err := preparePodNetworkChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails) err := preparePodNetworkChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails)
if err != nil { if err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err) log.Fatalf("helper pod failed, err: %v", err)
} }
} }
//preparePodNetworkChaos contains the prepration steps before chaos injection // preparePodNetworkChaos contains the prepration steps before chaos injection
func preparePodNetworkChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error { func preparePodNetworkChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
targetEnv := os.Getenv("TARGETS")
containerID, err := common.GetRuntimeBasedContainerID(experimentsDetails.ContainerRuntime, experimentsDetails.SocketPath, experimentsDetails.TargetPods, experimentsDetails.AppNS, experimentsDetails.TargetContainer, clients) if targetEnv == "" {
if err != nil { return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: "no target found, provide atleast one target"}
return err
}
// extract out the pid of the target container
targetPID, err := common.GetPauseAndSandboxPID(experimentsDetails.ContainerRuntime, containerID, experimentsDetails.SocketPath)
if err != nil {
return err
} }
// record the event inside chaosengine var targets []targetDetails
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod" for _, t := range strings.Split(targetEnv, ";") {
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails) target := strings.Split(t, ":")
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine") if len(target) != 4 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: fmt.Sprintf("unsupported target format: '%v'", targets)}
}
td := targetDetails{
Name: target[0],
Namespace: target[1],
TargetContainer: target[2],
DestinationIps: getDestIps(target[3]),
Source: chaosDetails.ChaosPodName,
}
td.ContainerId, err = common.GetRuntimeBasedContainerID(experimentsDetails.ContainerRuntime, experimentsDetails.SocketPath, td.Name, td.Namespace, td.TargetContainer, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the network ns path of the pod sandbox or pause container
td.NetworkNsPath, err = common.GetNetworkNsPath(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container network ns path")
}
targets = append(targets, td)
} }
// watching for the abort signal and revert the chaos // watching for the abort signal and revert the chaos
go abortWatcher(targetPID, experimentsDetails.NetworkInterface, resultDetails.Name, chaosDetails.ChaosNamespace, experimentsDetails.TargetPods) go abortWatcher(targets, experimentsDetails.NetworkInterface, resultDetails.Name, chaosDetails.ChaosNamespace)
// injecting network chaos inside target container
if err = injectChaos(experimentsDetails, targetPID); err != nil {
return err
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Stopping the experiment")
// cleaning the netem process after chaos injection
if err = killnetem(targetPID, experimentsDetails.NetworkInterface); err != nil {
return err
}
return result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods)
}
// injectChaos inject the network chaos in target container
// it is using nsenter command to enter into network namespace of target container
// and execute the netem command inside it.
func injectChaos(experimentDetails *experimentTypes.ExperimentDetails, pid int) error {
netemCommands := os.Getenv("NETEM_COMMAND")
select { select {
case <-inject: case <-inject:
// stopping the chaos execution, if abort signal received // stopping the chaos execution, if abort signal received
os.Exit(1) os.Exit(1)
default: default:
if len(destIps) == 0 && len(sPorts) == 0 && len(dPorts) == 0 { }
tc := fmt.Sprintf("sudo nsenter -t %d -n tc qdisc replace dev %s root netem %v", pid, experimentDetails.NetworkInterface, netemCommands)
cmd := exec.Command("/bin/bash", "-c", tc) for index, t := range targets {
out, err := cmd.CombinedOutput() // injecting network chaos inside target container
log.Info(cmd.String()) if err = injectChaos(experimentsDetails.NetworkInterface, t); err != nil {
if err != nil { if revertErr := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, index-1); revertErr != nil {
log.Error(string(out)) return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not inject chaos")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, index); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
if experimentsDetails.EngineName != "" {
msg := "Injected " + experimentsDetails.ExperimentName + " chaos on application pods"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Duration is over, reverting chaos")
if err := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
return nil
}
func revertChaosForAllTargets(targets []targetDetails, networkInterface string, resultDetails *types.ResultDetails, chaosNs string, index int) error {
var errList []string
for i := 0; i <= index; i++ {
killed, err := killnetem(targets[i], networkInterface)
if !killed && err != nil {
errList = append(errList, err.Error())
continue
}
if killed && err == nil {
if err = result.AnnotateChaosResult(resultDetails.Name, chaosNs, "reverted", "pod", targets[i].Name); err != nil {
errList = append(errList, err.Error())
}
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// injectChaos inject the network chaos in target container
// it is using nsenter command to enter into network namespace of target container
// and execute the netem command inside it.
func injectChaos(netInterface string, target targetDetails) error {
netemCommands := os.Getenv("NETEM_COMMAND")
if len(target.DestinationIps) == 0 && len(sPorts) == 0 && len(dPorts) == 0 && len(whitelistDPorts) == 0 && len(whitelistSPorts) == 0 {
tc := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %s root %v", target.NetworkNsPath, netInterface, netemCommands)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create tc rules", target.Source); err != nil {
return err
}
} else {
// Create a priority-based queue
// This instantly creates classes 1:1, 1:2, 1:3
priority := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %v root handle 1: prio", target.NetworkNsPath, netInterface)
log.Info(priority)
if err := common.RunBashCommand(priority, "failed to create priority-based queue", target.Source); err != nil {
return err
}
// Add queueing discipline for 1:3 class.
// No traffic is going through 1:3 yet
traffic := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %v parent 1:3 %v", target.NetworkNsPath, netInterface, netemCommands)
log.Info(traffic)
if err := common.RunBashCommand(traffic, "failed to create netem queueing discipline", target.Source); err != nil {
return err
}
if len(whitelistDPorts) != 0 || len(whitelistSPorts) != 0 {
for _, port := range whitelistDPorts {
//redirect traffic to specific dport through band 2
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 2 u32 match ip dport %v 0xffff flowid 1:2", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create whitelist dport match filters", target.Source); err != nil {
return err
}
}
for _, port := range whitelistSPorts {
//redirect traffic to specific sport through band 2
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 2 u32 match ip sport %v 0xffff flowid 1:2", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create whitelist sport match filters", target.Source); err != nil {
return err
}
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dst 0.0.0.0/0 flowid 1:3", target.NetworkNsPath, netInterface)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create rule for all ports match filters", target.Source); err != nil {
return err return err
} }
} else { } else {
for i := range target.DestinationIps {
// Create a priority-based queue var (
// This instantly creates classes 1:1, 1:2, 1:3 ip = target.DestinationIps[i]
priority := fmt.Sprintf("sudo nsenter -t %v -n tc qdisc replace dev %v root handle 1: prio", pid, experimentDetails.NetworkInterface) ports []string
cmd := exec.Command("/bin/bash", "-c", priority) isIPV6 = strings.Contains(target.DestinationIps[i], ":")
out, err := cmd.CombinedOutput() )
log.Info(cmd.String()) // extracting the destination ports from the ips
if err != nil { // ip format is ip(|port1|port2....|portx)
log.Error(string(out)) if strings.Contains(target.DestinationIps[i], "|") {
return err ip = strings.Split(target.DestinationIps[i], "|")[0]
} ports = strings.Split(target.DestinationIps[i], "|")[1:]
// Add queueing discipline for 1:3 class.
// No traffic is going through 1:3 yet
traffic := fmt.Sprintf("sudo nsenter -t %v -n tc qdisc replace dev %v parent 1:3 netem %v", pid, experimentDetails.NetworkInterface, netemCommands)
cmd = exec.Command("/bin/bash", "-c", traffic)
out, err = cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
for _, ip := range destIps {
// redirect traffic to specific IP through band 3
tc := fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dst %v flowid 1:3", pid, experimentDetails.NetworkInterface, ip)
if strings.Contains(ip, ":") {
tc = fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip6 dst %v flowid 1:3", pid, experimentDetails.NetworkInterface, ip)
} }
cmd = exec.Command("/bin/bash", "-c", tc)
out, err = cmd.CombinedOutput() // redirect traffic to specific IP through band 3
log.Info(cmd.String()) filter := fmt.Sprintf("match ip dst %v", ip)
if err != nil { if isIPV6 {
log.Error(string(out)) filter = fmt.Sprintf("match ip6 dst %v", ip)
}
if len(ports) != 0 {
for _, port := range ports {
portFilter := fmt.Sprintf("%s match ip dport %v 0xffff", filter, port)
if isIPV6 {
portFilter = fmt.Sprintf("%s match ip6 dport %v 0xffff", filter, port)
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 %s flowid 1:3", target.NetworkNsPath, netInterface, portFilter)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ips match filters", target.Source); err != nil {
return err
}
}
continue
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 %s flowid 1:3", target.NetworkNsPath, netInterface, filter)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ips match filters", target.Source); err != nil {
return err return err
} }
} }
for _, port := range sPorts { for _, port := range sPorts {
//redirect traffic to specific sport through band 3 //redirect traffic to specific sport through band 3
tc := fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip sport %v 0xffff flowid 1:3", pid, experimentDetails.NetworkInterface, port) tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip sport %v 0xffff flowid 1:3", target.NetworkNsPath, netInterface, port)
cmd = exec.Command("/bin/bash", "-c", tc) log.Info(tc)
out, err = cmd.CombinedOutput() if err := common.RunBashCommand(tc, "failed to create source ports match filters", target.Source); err != nil {
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err return err
} }
} }
for _, port := range dPorts { for _, port := range dPorts {
//redirect traffic to specific dport through band 3 //redirect traffic to specific dport through band 3
tc := fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dport %v 0xffff flowid 1:3", pid, experimentDetails.NetworkInterface, port) tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dport %v 0xffff flowid 1:3", target.NetworkNsPath, netInterface, port)
cmd = exec.Command("/bin/bash", "-c", tc) log.Info(tc)
out, err = cmd.CombinedOutput() if err := common.RunBashCommand(tc, "failed to create destination ports match filters", target.Source); err != nil {
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err return err
} }
} }
} }
} }
log.Infof("chaos injected successfully on {pod: %v, container: %v}", target.Name, target.TargetContainer)
return nil return nil
} }
// killnetem kill the netem process for all the target containers // killnetem kill the netem process for all the target containers
func killnetem(PID int, networkInterface string) error { func killnetem(target targetDetails, networkInterface string) (bool, error) {
tc := fmt.Sprintf("sudo nsenter --net=%s tc qdisc delete dev %s root", target.NetworkNsPath, networkInterface)
tc := fmt.Sprintf("sudo nsenter -t %d -n tc qdisc delete dev %s root", PID, networkInterface)
cmd := exec.Command("/bin/bash", "-c", tc) cmd := exec.Command("/bin/bash", "-c", tc)
out, err := cmd.CombinedOutput() out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil { if err != nil {
log.Error(string(out)) log.Info(cmd.String())
// ignoring err if qdisc process doesn't exist inside the target container // ignoring err if qdisc process doesn't exist inside the target container
if strings.Contains(string(out), qdiscNotFound) || strings.Contains(string(out), qdiscNoFileFound) { if strings.Contains(string(out), qdiscNotFound) || strings.Contains(string(out), qdiscNoFileFound) {
log.Warn("The network chaos process has already been removed") log.Warn("The network chaos process has already been removed")
return nil return true, err
} }
return err log.Error(err.Error())
return false, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: target.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", target.Name, target.Namespace, target.TargetContainer), Reason: fmt.Sprintf("failed to revert network faults: %s", string(out))}
} }
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", target.Name, target.Namespace, target.TargetContainer)
return nil return true, nil
} }
//getENV fetches all the env variables from the runner pod type targetDetails struct {
Name string
Namespace string
ServiceMesh string
DestinationIps []string
TargetContainer string
ContainerId string
Source string
NetworkNsPath string
}
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) { func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "") experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "") experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "") experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", ""))
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.AppLabel = types.Getenv("APP_LABEL", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus") experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "") experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", "")) experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "") experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "") experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.NetworkInterface = types.Getenv("NETWORK_INTERFACE", "eth0") experimentDetails.NetworkInterface = types.Getenv("NETWORK_INTERFACE", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "") experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
experimentDetails.DestinationIPs = types.Getenv("DESTINATION_IPS", "") experimentDetails.DestinationIPs = types.Getenv("DESTINATION_IPS", "")
experimentDetails.SourcePorts = types.Getenv("SOURCE_PORTS", "") experimentDetails.SourcePorts = types.Getenv("SOURCE_PORTS", "")
experimentDetails.DestinationPorts = types.Getenv("DESTINATION_PORTS", "") experimentDetails.DestinationPorts = types.Getenv("DESTINATION_PORTS", "")
destIps = getDestinationIPs(experimentDetails.DestinationIPs)
if strings.TrimSpace(experimentDetails.DestinationPorts) != "" { if strings.TrimSpace(experimentDetails.DestinationPorts) != "" {
dPorts = strings.Split(strings.TrimSpace(experimentDetails.DestinationPorts), ",") if strings.Contains(experimentDetails.DestinationPorts, "!") {
} whitelistDPorts = strings.Split(strings.TrimPrefix(strings.TrimSpace(experimentDetails.DestinationPorts), "!"), ",")
if strings.TrimSpace(experimentDetails.SourcePorts) != "" { } else {
sPorts = strings.Split(strings.TrimSpace(experimentDetails.SourcePorts), ",") dPorts = strings.Split(strings.TrimSpace(experimentDetails.DestinationPorts), ",")
} }
} }
if strings.TrimSpace(experimentDetails.SourcePorts) != "" {
func getDestinationIPs(ips string) []string { if strings.Contains(experimentDetails.SourcePorts, "!") {
if strings.TrimSpace(ips) == "" { whitelistSPorts = strings.Split(strings.TrimPrefix(strings.TrimSpace(experimentDetails.SourcePorts), "!"), ",")
return nil } else {
} sPorts = strings.Split(strings.TrimSpace(experimentDetails.SourcePorts), ",")
destIPs := strings.Split(strings.TrimSpace(ips), ",")
var uniqueIps []string
// removing duplicates ips from the list, if any
for i := range destIPs {
if !common.Contains(destIPs[i], uniqueIps) {
uniqueIps = append(uniqueIps, destIPs[i])
} }
} }
return uniqueIps
} }
// abortWatcher continuously watch for the abort signals // abortWatcher continuously watch for the abort signals
func abortWatcher(targetPID int, networkInterface, resultName, chaosNS, targetPodName string) { func abortWatcher(targets []targetDetails, networkInterface, resultName, chaosNS string) {
<-abort <-abort
log.Info("[Chaos]: Killing process started because of terminated signal received") log.Info("[Chaos]: Killing process started because of terminated signal received")
@ -279,15 +378,46 @@ func abortWatcher(targetPID int, networkInterface, resultName, chaosNS, targetPo
// retry thrice for the chaos revert // retry thrice for the chaos revert
retry := 3 retry := 3
for retry > 0 { for retry > 0 {
if err = killnetem(targetPID, networkInterface); err != nil { for _, t := range targets {
log.Errorf("unable to kill netem process, err :%v", err) killed, err := killnetem(t, networkInterface)
if err != nil && !killed {
log.Errorf("unable to kill netem process, err :%v", err)
continue
}
if killed && err == nil {
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
}
} }
retry-- retry--
time.Sleep(1 * time.Second) time.Sleep(1 * time.Second)
} }
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", targetPodName); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
log.Info("Chaos Revert Completed") log.Info("Chaos Revert Completed")
os.Exit(1) os.Exit(1)
} }
func getDestIps(serviceMesh string) []string {
var (
destIps = os.Getenv("DESTINATION_IPS")
uniqueIps []string
)
if serviceMesh == "true" {
destIps = os.Getenv("DESTINATION_IPS_SERVICE_MESH")
}
if strings.TrimSpace(destIps) == "" {
return nil
}
ips := strings.Split(strings.TrimSpace(destIps), ",")
// removing duplicates ips from the list, if any
for i := range ips {
if !common.Contains(ips[i], uniqueIps) {
uniqueIps = append(uniqueIps, ips[i])
}
}
return uniqueIps
}

View File

@ -1,15 +1,26 @@
package corruption package corruption
import ( import (
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib" network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
) )
//PodNetworkCorruptionChaos contains the steps to prepare and inject chaos // PodNetworkCorruptionChaos contains the steps to prepare and inject chaos
func PodNetworkCorruptionChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PodNetworkCorruptionChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkCorruptionFault")
defer span.End()
args := "corrupt " + experimentsDetails.NetworkPacketCorruptionPercentage args := "netem corrupt " + experimentsDetails.NetworkPacketCorruptionPercentage
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args) if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
} }

View File

@ -1,15 +1,26 @@
package duplication package duplication
import ( import (
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib" network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
) )
//PodNetworkDuplicationChaos contains the steps to prepare and inject chaos // PodNetworkDuplicationChaos contains the steps to prepare and inject chaos
func PodNetworkDuplicationChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PodNetworkDuplicationChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkDuplicationFault")
defer span.End()
args := "duplicate " + experimentsDetails.NetworkPacketDuplicationPercentage args := "netem duplicate " + experimentsDetails.NetworkPacketDuplicationPercentage
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args) if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
} }

View File

@ -1,17 +1,27 @@
package latency package latency
import ( import (
"context"
"fmt"
"strconv" "strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib" network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
) )
//PodNetworkLatencyChaos contains the steps to prepare and inject chaos // PodNetworkLatencyChaos contains the steps to prepare and inject chaos
func PodNetworkLatencyChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PodNetworkLatencyChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkLatencyFault")
defer span.End()
args := "delay " + strconv.Itoa(experimentsDetails.NetworkLatency) + "ms " + strconv.Itoa(experimentsDetails.Jitter) + "ms" args := "netem delay " + strconv.Itoa(experimentsDetails.NetworkLatency) + "ms " + strconv.Itoa(experimentsDetails.Jitter) + "ms"
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args) if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
} }

View File

@ -1,15 +1,26 @@
package loss package loss
import ( import (
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib" network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
) )
//PodNetworkLossChaos contains the steps to prepare and inject chaos // PodNetworkLossChaos contains the steps to prepare and inject chaos
func PodNetworkLossChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PodNetworkLossChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkLossFault")
defer span.End()
args := "loss " + experimentsDetails.NetworkPacketLossPercentage args := "netem loss " + experimentsDetails.NetworkPacketLossPercentage
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args) if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
} }

View File

@ -3,95 +3,50 @@ package lib
import ( import (
"context" "context"
"fmt" "fmt"
k8serrors "k8s.io/apimachinery/pkg/api/errors"
"net" "net"
"os"
"strconv" "strconv"
"strings" "strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
k8serrors "k8s.io/apimachinery/pkg/api/errors"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1" apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
) )
var serviceMesh = []string{"istio", "envoy"} var serviceMesh = []string{"istio", "envoy"}
var destIpsSvcMesh string
var destIps string
//PrepareAndInjectChaos contains the prepration & injection steps // PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error { func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error {
targetPodList := apiv1.PodList{}
var err error var err error
var podsAffectedPerc int
// Get the target pod details for the chaos execution // Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage // if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" { if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
} }
//setup the tunables if provided in range //set up the tunables if provided in range
SetChaosTunables(experimentsDetails) SetChaosTunables(experimentsDetails)
logExperimentFields(experimentsDetails)
switch experimentsDetails.NetworkChaosType { targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
case "network-loss": if err != nil {
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{ return stacktrace.Propagate(err, "could not get target pods")
"NetworkPacketLossPercentage": experimentsDetails.NetworkPacketLossPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "network-latency":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkLatency": strconv.Itoa(experimentsDetails.NetworkLatency),
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "network-corruption":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketCorruptionPercentage": experimentsDetails.NetworkPacketCorruptionPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "network-duplication":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketDuplicationPercentage": experimentsDetails.NetworkPacketDuplicationPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
} }
podsAffectedPerc, _ = strconv.Atoi(experimentsDetails.PodsAffectedPerc)
if experimentsDetails.NodeLabel == "" {
//targetPodList, err := common.GetPodListFromSpecifiedNodes(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
} else {
//targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if experimentsDetails.TargetPods == "" {
targetPodList, err = common.GetPodListFromSpecifiedNodes(experimentsDetails.TargetPods, podsAffectedPerc, experimentsDetails.NodeLabel, clients, chaosDetails)
if err != nil {
return err
}
} else {
log.Infof("TARGET_PODS env is provided, overriding the NODE_LABEL input")
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
}
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection //Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
@ -103,40 +58,41 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
if experimentsDetails.ChaosServiceAccount == "" { if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients) experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil { if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err) return stacktrace.Propagate(err, "could not get experiment service account")
} }
} }
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil { if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err return stacktrace.Propagate(err, "could not set helper data")
} }
} }
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "") experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil { if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil { if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
return nil return nil
} }
// injectChaosInSerialMode inject the network chaos in all target application serially (one by one) // injectChaosInSerialMode inject the network chaos in all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodNetworkFaultInSerialMode")
defer span.End()
labelSuffix := common.GetRunID()
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -144,51 +100,27 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
// creating the helper pod to perform network chaos // creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items { for _, pod := range targetPodList.Items {
destIPs, err := GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients, isServiceMeshEnabledForPod(pod)) serviceMesh, err := setDestIps(pod, experimentsDetails, clients)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not set destination ips")
} }
//Get the target container name of the application pod //Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided { if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{ runID := stringutils.GetRunID()
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName, if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer, serviceMesh), pod.Spec.NodeName, runID, args); err != nil {
"ContainerName": experimentsDetails.TargetContainer, return stacktrace.Propagate(err, "could not create helper pod")
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, args, labelSuffix, destIPs); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
} }
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment //checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods") if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil { return err
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting the the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pods, err: %v", err)
} }
} }
@ -196,89 +128,68 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
// injectChaosInParallelMode inject the network chaos in all target application in parallel mode (all at once) // injectChaosInParallelMode inject the network chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodNetworkFaultInParallelMode")
labelSuffix := common.GetRunID() defer span.End()
var err error var err error
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
// creating the helper pod to perform network chaos targets, err := filterPodsForNodes(targetPodList, experimentsDetails, clients)
for _, pod := range targetPodList.Items { if err != nil {
return stacktrace.Propagate(err, "could not filter target pods")
}
destIPs, err := GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients, isServiceMeshEnabledForPod(pod)) runID := stringutils.GetRunID()
if err != nil {
return err for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s:%s", k.Name, k.Namespace, k.TargetContainer, k.ServiceMesh))
} }
//Get the target container name of the application pod if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID, args); err != nil {
//It checks the empty target container for the first iteration only return stacktrace.Propagate(err, "could not create helper pod")
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, args, labelSuffix, destIPs); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
} }
} }
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
log.Info("[Status]: Checking the status of the helper pods") return err
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pods, err: %v", err)
} }
return nil return nil
} }
// createHelperPod derive the attributes for helper pod and create the helper pod // createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, podName, nodeName, runID, args, labelSuffix, destIPs string) error { func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets string, nodeName, runID, args string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreatePodNetworkFaultHelperPod")
defer span.End()
privilegedEnable := true var (
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds) privilegedEnable = true
terminationGracePeriodSeconds = int64(experimentsDetails.TerminationGracePeriodSeconds)
helperName = fmt.Sprintf("%s-helper-%s", experimentsDetails.ExperimentName, stringutils.GetRunID())
)
helperPod := &apiv1.Pod{ helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{ ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID, Name: helperName,
Namespace: experimentsDetails.ChaosNamespace, Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName), Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations, Annotations: chaosDetails.Annotations,
}, },
Spec: apiv1.PodSpec{ Spec: apiv1.PodSpec{
HostPID: true, HostPID: true,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds, TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
ImagePullSecrets: chaosDetails.ImagePullSecrets, ImagePullSecrets: chaosDetails.ImagePullSecrets,
Tolerations: chaosDetails.Tolerations,
ServiceAccountName: experimentsDetails.ChaosServiceAccount, ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever, RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName, NodeName: nodeName,
@ -306,7 +217,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name network-chaos", "./helpers -name network-chaos",
}, },
Resources: chaosDetails.Resources, Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, podName, args, destIPs), Env: getPodEnv(ctx, experimentsDetails, targets, args),
VolumeMounts: []apiv1.VolumeMount{ VolumeMounts: []apiv1.VolumeMount{
{ {
Name: "cri-socket", Name: "cri-socket",
@ -327,18 +238,40 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
}, },
} }
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{}) if len(chaosDetails.SideCar) != 0 {
return err helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
// mount the network ns path for crio runtime
// it is required to access the sandbox network ns
if strings.ToLower(experimentsDetails.ContainerRuntime) == "crio" {
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, apiv1.Volume{
Name: "netns-path",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/var/run/netns",
},
},
})
helperPod.Spec.Containers[0].VolumeMounts = append(helperPod.Spec.Containers[0].VolumeMounts, apiv1.VolumeMount{
Name: "netns-path",
MountPath: "/var/run/netns",
})
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
} }
// getPodEnv derive all the env required for the helper pod // getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName, args, destIPs string) []apiv1.EnvVar { func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string, args string) []apiv1.EnvVar {
var envDetails common.ENVDetails var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS). envDetails.SetEnv("TARGETS", targets).
SetEnv("APP_POD", podName).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)). SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace). SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName). SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
@ -348,23 +281,37 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName, a
SetEnv("NETWORK_INTERFACE", experimentsDetails.NetworkInterface). SetEnv("NETWORK_INTERFACE", experimentsDetails.NetworkInterface).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName). SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath). SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("DESTINATION_IPS", destIPs).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID). SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("DESTINATION_IPS", destIps).
SetEnv("DESTINATION_IPS_SERVICE_MESH", destIpsSvcMesh).
SetEnv("SOURCE_PORTS", experimentsDetails.SourcePorts). SetEnv("SOURCE_PORTS", experimentsDetails.SourcePorts).
SetEnv("DESTINATION_PORTS", experimentsDetails.DestinationPorts). SetEnv("DESTINATION_PORTS", experimentsDetails.DestinationPorts).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name") SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV return envDetails.ENV
} }
type targetsDetails struct {
Target []target
}
type target struct {
Namespace string
Name string
TargetContainer string
ServiceMesh string
}
// GetTargetIps return the comma separated target ips // GetTargetIps return the comma separated target ips
// It fetch the ips from the target ips (if defined by users) // It fetches the ips from the target ips (if defined by users)
// it append the ips from the host, if target host is provided // it appends the ips from the host, if target host is provided
func GetTargetIps(targetIPs, targetHosts string, clients clients.ClientSets, serviceMesh bool) (string, error) { func GetTargetIps(targetIPs, targetHosts string, clients clients.ClientSets, serviceMesh bool) (string, error) {
ipsFromHost, err := getIpsForTargetHosts(targetHosts, clients, serviceMesh) ipsFromHost, err := getIpsForTargetHosts(targetHosts, clients, serviceMesh)
if err != nil { if err != nil {
return "", err return "", stacktrace.Propagate(err, "could not get ips from target hosts")
} }
if targetIPs == "" { if targetIPs == "" {
targetIPs = ipsFromHost targetIPs = ipsFromHost
@ -374,31 +321,46 @@ func GetTargetIps(targetIPs, targetHosts string, clients clients.ClientSets, ser
return targetIPs, nil return targetIPs, nil
} }
// it derive the pod ips from the kubernetes service // it derives the pod ips from the kubernetes service
func getPodIPFromService(host string, clients clients.ClientSets) ([]string, error) { func getPodIPFromService(host string, clients clients.ClientSets) ([]string, error) {
var ips []string var ips []string
svcFields := strings.Split(host, ".") svcFields := strings.Split(host, ".")
if len(svcFields) != 5 { if len(svcFields) != 5 {
return ips, fmt.Errorf("provide the valid FQDN for service in '<svc-name>.<namespace>.svc.cluster.local format, host: %v", host) return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{host: %s}", host), Reason: "provide the valid FQDN for service in '<svc-name>.<namespace>.svc.cluster.local format"}
} }
svcName, svcNs := svcFields[0], svcFields[1] svcName, svcNs := svcFields[0], svcFields[1]
svc, err := clients.KubeClient.CoreV1().Services(svcNs).Get(context.Background(), svcName, v1.GetOptions{}) svc, err := clients.GetService(svcNs, svcName)
if err != nil { if err != nil {
if k8serrors.IsForbidden(err) { if k8serrors.IsForbidden(err) {
log.Warnf("forbidden - failed to get %v service in %v namespace, err: %v", svcName, svcNs, err) log.Warnf("forbidden - failed to get %v service in %v namespace, err: %v", svcName, svcNs, err)
return ips, nil return ips, nil
} }
return ips, err return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{serviceName: %s, namespace: %s}", svcName, svcNs), Reason: err.Error()}
} }
if svc.Spec.Selector == nil {
return nil, nil
}
var svcSelector string
for k, v := range svc.Spec.Selector { for k, v := range svc.Spec.Selector {
pods, err := clients.KubeClient.CoreV1().Pods(svcNs).List(context.Background(), v1.ListOptions{LabelSelector: fmt.Sprintf("%s=%s", k, v)}) if svcSelector == "" {
if err != nil { svcSelector += fmt.Sprintf("%s=%s", k, v)
return ips, err continue
}
for _, p := range pods.Items {
ips = append(ips, p.Status.PodIP)
} }
svcSelector += fmt.Sprintf(",%s=%s", k, v)
} }
pods, err := clients.ListPods(svcNs, svcSelector)
if err != nil {
return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{svcName: %s,podLabel: %s, namespace: %s}", svcNs, svcSelector, svcNs), Reason: fmt.Sprintf("failed to derive pods from service: %s", err.Error())}
}
for _, p := range pods.Items {
if p.Status.PodIP == "" {
continue
}
ips = append(ips, p.Status.PodIP)
}
return ips, nil return ips, nil
} }
@ -412,27 +374,49 @@ func getIpsForTargetHosts(targetHosts string, clients clients.ClientSets, servic
var commaSeparatedIPs []string var commaSeparatedIPs []string
for i := range hosts { for i := range hosts {
hosts[i] = strings.TrimSpace(hosts[i]) hosts[i] = strings.TrimSpace(hosts[i])
if strings.Contains(hosts[i], "svc.cluster.local") && serviceMesh { var (
ips, err := getPodIPFromService(hosts[i], clients) hostName = hosts[i]
ports []string
)
if strings.Contains(hosts[i], "|") {
host := strings.Split(hosts[i], "|")
hostName = host[0]
ports = host[1:]
log.Infof("host and port: %v :%v", hostName, ports)
}
if strings.Contains(hostName, "svc.cluster.local") && serviceMesh {
ips, err := getPodIPFromService(hostName, clients)
if err != nil { if err != nil {
return "", err return "", stacktrace.Propagate(err, "could not get pod ips from service")
} }
log.Infof("Host: {%v}, IP address: {%v}", hosts[i], ips) log.Infof("Host: {%v}, IP address: {%v}", hosts[i], ips)
commaSeparatedIPs = append(commaSeparatedIPs, ips...) if ports != nil {
for j := range ips {
commaSeparatedIPs = append(commaSeparatedIPs, ips[j]+"|"+strings.Join(ports, "|"))
}
} else {
commaSeparatedIPs = append(commaSeparatedIPs, ips...)
}
if finalHosts == "" { if finalHosts == "" {
finalHosts = hosts[i] finalHosts = hosts[i]
} else { } else {
finalHosts = finalHosts + "," + hosts[i] finalHosts = finalHosts + "," + hosts[i]
} }
finalHosts = finalHosts + "," + hosts[i]
continue continue
} }
ips, err := net.LookupIP(hosts[i]) ips, err := net.LookupIP(hostName)
if err != nil { if err != nil {
log.Warnf("Unknown host: {%v}, it won't be included in the scope of chaos", hosts[i]) log.Warnf("Unknown host: {%v}, it won't be included in the scope of chaos", hostName)
} else { } else {
for j := range ips { for j := range ips {
log.Infof("Host: {%v}, IP address: {%v}", hosts[i], ips[j]) log.Infof("Host: {%v}, IP address: {%v}", hostName, ips[j])
if ports != nil {
commaSeparatedIPs = append(commaSeparatedIPs, ips[j].String()+"|"+strings.Join(ports, "|"))
continue
}
commaSeparatedIPs = append(commaSeparatedIPs, ips[j].String()) commaSeparatedIPs = append(commaSeparatedIPs, ips[j].String())
} }
if finalHosts == "" { if finalHosts == "" {
@ -443,14 +427,14 @@ func getIpsForTargetHosts(targetHosts string, clients clients.ClientSets, servic
} }
} }
if len(commaSeparatedIPs) == 0 { if len(commaSeparatedIPs) == 0 {
return "", errors.Errorf("provided hosts: {%v} are invalid, unable to resolve", targetHosts) return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("hosts: %s", targetHosts), Reason: "provided hosts are invalid, unable to resolve"}
} }
log.Infof("Injecting chaos on {%v} hosts", finalHosts) log.Infof("Injecting chaos on {%v} hosts", finalHosts)
return strings.Join(commaSeparatedIPs, ","), nil return strings.Join(commaSeparatedIPs, ","), nil
} }
//SetChaosTunables will setup a random value within a given range of values // SetChaosTunables will set up a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value. // If the value is not provided in range it'll set up the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) { func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.NetworkPacketLossPercentage = common.ValidateRange(experimentsDetails.NetworkPacketLossPercentage) experimentsDetails.NetworkPacketLossPercentage = common.ValidateRange(experimentsDetails.NetworkPacketLossPercentage)
experimentsDetails.NetworkPacketCorruptionPercentage = common.ValidateRange(experimentsDetails.NetworkPacketCorruptionPercentage) experimentsDetails.NetworkPacketCorruptionPercentage = common.ValidateRange(experimentsDetails.NetworkPacketCorruptionPercentage)
@ -462,9 +446,102 @@ func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
// It checks if pod contains service mesh sidecar // It checks if pod contains service mesh sidecar
func isServiceMeshEnabledForPod(pod apiv1.Pod) bool { func isServiceMeshEnabledForPod(pod apiv1.Pod) bool {
for _, c := range pod.Spec.Containers { for _, c := range pod.Spec.Containers {
if common.StringExistsInSlice(c.Name, serviceMesh) { if common.SubStringExistsInSlice(c.Name, serviceMesh) {
return true return true
} }
} }
return false return false
} }
func setDestIps(pod apiv1.Pod, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (string, error) {
var err error
if isServiceMeshEnabledForPod(pod) {
if destIpsSvcMesh == "" {
destIpsSvcMesh, err = GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients, true)
if err != nil {
return "false", err
}
}
return "true", nil
}
if destIps == "" {
destIps, err = GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients, false)
if err != nil {
return "false", err
}
}
return "false", nil
}
func filterPodsForNodes(targetPodList apiv1.PodList, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (map[string]*targetsDetails, error) {
targets := make(map[string]*targetsDetails)
targetContainer := experimentsDetails.TargetContainer
for _, pod := range targetPodList.Items {
serviceMesh, err := setDestIps(pod, experimentsDetails, clients)
if err != nil {
return targets, stacktrace.Propagate(err, "could not set destination ips")
}
if experimentsDetails.TargetContainer == "" {
targetContainer = pod.Spec.Containers[0].Name
}
td := target{
Name: pod.Name,
Namespace: pod.Namespace,
TargetContainer: targetContainer,
ServiceMesh: serviceMesh,
}
if targets[pod.Spec.NodeName] == nil {
targets[pod.Spec.NodeName] = &targetsDetails{
Target: []target{td},
}
} else {
targets[pod.Spec.NodeName].Target = append(targets[pod.Spec.NodeName].Target, td)
}
}
return targets, nil
}
func logExperimentFields(experimentsDetails *experimentTypes.ExperimentDetails) {
switch experimentsDetails.NetworkChaosType {
case "network-loss":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketLossPercentage": experimentsDetails.NetworkPacketLossPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-latency":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkLatency": strconv.Itoa(experimentsDetails.NetworkLatency),
"Jitter": experimentsDetails.Jitter,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-corruption":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketCorruptionPercentage": experimentsDetails.NetworkPacketCorruptionPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-duplication":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketDuplicationPercentage": experimentsDetails.NetworkPacketDuplicationPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-rate-limit":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkBandwidth": experimentsDetails.NetworkBandwidth,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
}
}

View File

@ -0,0 +1,29 @@
package rate
import (
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
// PodNetworkRateChaos contains the steps to prepare and inject chaos
func PodNetworkRateChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkRateLimit")
defer span.End()
args := fmt.Sprintf("tbf rate %s burst %s limit %s", experimentsDetails.NetworkBandwidth, experimentsDetails.Burst, experimentsDetails.Limit)
if experimentsDetails.PeakRate != "" {
args = fmt.Sprintf("%s peakrate %s", args, experimentsDetails.PeakRate)
}
if experimentsDetails.MinBurst != "" {
args = fmt.Sprintf("%s mtu %s", args, experimentsDetails.MinBurst)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -2,10 +2,16 @@ package lib
import ( import (
"context" "context"
"fmt"
"strconv" "strconv"
"strings" "strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-cpu-hog/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-cpu-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
@ -13,23 +19,25 @@ import (
"github.com/litmuschaos/litmus-go/pkg/status" "github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1" apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
) )
// PrepareNodeCPUHog contains prepration steps before chaos injection // PrepareNodeCPUHog contains preparation steps before chaos injection
func PrepareNodeCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareNodeCPUHog(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeCPUHogFault")
defer span.End()
//setup the tunables if provided in range //set up the tunables if provided in range
setChaosTunables(experimentsDetails) setChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{ log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Node CPU Cores": experimentsDetails.NodeCPUcores, "Node CPU Cores": experimentsDetails.NodeCPUcores,
"CPU Load": experimentsDetails.CPULoad, "CPU Load": experimentsDetails.CPULoad,
"Node Affce Perc": experimentsDetails.NodesAffectedPerc, "Node Affected Percentage": experimentsDetails.NodesAffectedPerc,
"Sequence": experimentsDetails.Sequence, "Sequence": experimentsDetails.Sequence,
}) })
//Waiting for the ramp time before chaos injection //Waiting for the ramp time before chaos injection
@ -42,7 +50,7 @@ func PrepareNodeCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, cl
nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc) nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc)
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients) targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get node list")
} }
log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{ log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{
@ -52,21 +60,21 @@ func PrepareNodeCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, cl
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil { if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err return stacktrace.Propagate(err, "could not set helper data")
} }
} }
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInSerialMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInParallelMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -78,14 +86,15 @@ func PrepareNodeCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, cl
} }
// injectChaosInSerialMode stress the cpu of all the target nodes serially (one by one) // injectChaosInSerialMode stress the cpu of all the target nodes serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeCPUHogFaultInSerialMode")
defer span.End()
nodeCPUCores := experimentsDetails.NodeCPUcores nodeCPUCores := experimentsDetails.NodeCPUcores
labelSuffix := common.GetRunID()
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -101,29 +110,29 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
// When number of cpu cores for hogging is not defined , it will take it from node capacity // When number of cpu cores for hogging is not defined , it will take it from node capacity
if nodeCPUCores == "0" { if nodeCPUCores == "0" {
if err := setCPUCapacity(experimentsDetails, appNode, clients); err != nil { if err := setCPUCapacity(experimentsDetails, appNode, clients); err != nil {
return err return stacktrace.Propagate(err, "could not get node cpu capacity")
} }
} }
log.InfoWithValues("[Info]: Details of Node under chaos injection", logrus.Fields{ log.InfoWithValues("[Info]: Details of Node under chaos injection", logrus.Fields{
"NodeName": appNode, "NodeName": appNode,
"NodeCPUcores": experimentsDetails.NodeCPUcores, "NodeCPUCores": experimentsDetails.NodeCPUcores,
}) })
experimentsDetails.RunID = common.GetRunID() experimentsDetails.RunID = stringutils.GetRunID()
// Creating the helper pod to perform node cpu hog // Creating the helper pod to perform node cpu hog
if err := createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix); err != nil { if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err) return stacktrace.Propagate(err, "could not create helper pod")
} }
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod //Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod") log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil { if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients) common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err) return stacktrace.Propagate(err, "could not check helper status")
} }
common.SetTargets(appNode, "targeted", "node", chaosDetails) common.SetTargets(appNode, "targeted", "node", chaosDetails)
@ -132,32 +141,35 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Info("[Wait]: Waiting till the completion of the helper pod") log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName) podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" { if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients) common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err) return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, false)
} }
//Deleting the helper pod //Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod") log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil { if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err) return stacktrace.Propagate(err, "could not delete helper pod(s)")
} }
} }
return nil return nil
} }
// injectChaosInParallelMode stress the cpu of all the target nodes in parallel mode (all at once) // injectChaosInParallelMode stress the cpu of all the target nodes in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
nodeCPUCores := experimentsDetails.NodeCPUcores ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeCPUHogFaultInParallelMode")
defer span.End()
labelSuffix := common.GetRunID() nodeCPUCores := experimentsDetails.NodeCPUcores
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
experimentsDetails.RunID = stringutils.GetRunID()
for _, appNode := range targetNodeList { for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
@ -169,7 +181,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
// When number of cpu cores for hogging is not defined , it will take it from node capacity // When number of cpu cores for hogging is not defined , it will take it from node capacity
if nodeCPUCores == "0" { if nodeCPUCores == "0" {
if err := setCPUCapacity(experimentsDetails, appNode, clients); err != nil { if err := setCPUCapacity(experimentsDetails, appNode, clients); err != nil {
return err return stacktrace.Propagate(err, "could not get node cpu capacity")
} }
} }
@ -178,65 +190,44 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"NodeCPUcores": experimentsDetails.NodeCPUcores, "NodeCPUcores": experimentsDetails.NodeCPUcores,
}) })
experimentsDetails.RunID = common.GetRunID()
// Creating the helper pod to perform node cpu hog // Creating the helper pod to perform node cpu hog
if err := createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix); err != nil { if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err) return stacktrace.Propagate(err, "could not create helper pod")
} }
} }
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
log.Info("[Status]: Checking the status of the helper pods") return err
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, appNode := range targetNodeList {
common.SetTargets(appNode, "targeted", "node", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
} }
return nil return nil
} }
//setCPUCapacity fetch the node cpu capacity // setCPUCapacity fetch the node cpu capacity
func setCPUCapacity(experimentsDetails *experimentTypes.ExperimentDetails, appNode string, clients clients.ClientSets) error { func setCPUCapacity(experimentsDetails *experimentTypes.ExperimentDetails, appNode string, clients clients.ClientSets) error {
node, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), appNode, v1.GetOptions{}) node, err := clients.GetNode(appNode, experimentsDetails.Timeout, experimentsDetails.Delay)
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNode), Reason: err.Error()}
} }
experimentsDetails.NodeCPUcores = node.Status.Capacity.Cpu().String() experimentsDetails.NodeCPUcores = node.Status.Capacity.Cpu().String()
return nil return nil
} }
// createHelperPod derive the attributes for helper pod and create the helper pod // createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets, labelSuffix string) error { func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeCPUHogFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds) terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{ helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{ ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID, GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace, Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, labelSuffix, experimentsDetails.ExperimentName), Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations, Annotations: chaosDetails.Annotations,
}, },
Spec: apiv1.PodSpec{ Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever, RestartPolicy: apiv1.RestartPolicyNever,
@ -265,12 +256,20 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
}, },
} }
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{}) if len(chaosDetails.SideCar) != 0 {
return err helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
} }
//setChaosTunables will setup a random value within a given range of values // setChaosTunables will set up a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value. // If the value is not provided in range it'll set up the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) { func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.NodeCPUcores = common.ValidateRange(experimentsDetails.NodeCPUcores) experimentsDetails.NodeCPUcores = common.ValidateRange(experimentsDetails.NodeCPUcores)
experimentsDetails.CPULoad = common.ValidateRange(experimentsDetails.CPULoad) experimentsDetails.CPULoad = common.ValidateRange(experimentsDetails.CPULoad)

View File

@ -1,8 +1,8 @@
package lib package lib
import ( import (
"bytes"
"context" "context"
"fmt"
"os" "os"
"os/exec" "os/exec"
"os/signal" "os/signal"
@ -11,7 +11,12 @@ import (
"syscall" "syscall"
"time" "time"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-drain/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-drain/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
@ -20,7 +25,6 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry" "github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
apierrors "k8s.io/apimachinery/pkg/api/errors" apierrors "k8s.io/apimachinery/pkg/api/errors"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
) )
@ -30,8 +34,10 @@ var (
inject, abort chan os.Signal inject, abort chan os.Signal
) )
//PrepareNodeDrain contains the prepration steps before chaos injection // PrepareNodeDrain contains the preparation steps before chaos injection
func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareNodeDrain(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeDrainFault")
defer span.End()
// inject channel is used to transmit signal notifications. // inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
@ -53,7 +59,7 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
//Select node for kubelet-service-kill //Select node for kubelet-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients) experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get node name")
} }
} }
@ -65,7 +71,7 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -74,18 +80,22 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
go abortWatcher(experimentsDetails, clients, resultDetails, chaosDetails, eventsDetails) go abortWatcher(experimentsDetails, clients, resultDetails, chaosDetails, eventsDetails)
// Drain the application node // Drain the application node
if err := drainNode(experimentsDetails, clients, chaosDetails); err != nil { if err := drainNode(ctx, experimentsDetails, clients, chaosDetails); err != nil {
return err log.Info("[Revert]: Reverting chaos because error during draining of node")
if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(uncordonErr).Error())}
}
return stacktrace.Propagate(err, "could not drain node")
} }
// Verify the status of AUT after reschedule // Verify the status of AUT after reschedule
log.Info("[Status]: Verify the status of AUT after reschedule") log.Info("[Status]: Verify the status of AUT after reschedule")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil { if err = status.AUTStatusCheck(clients, chaosDetails); err != nil {
log.Info("[Revert]: Reverting chaos because application status check failed") log.Info("[Revert]: Reverting chaos because application status check failed")
if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil { if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil {
log.Errorf("Unable to uncordon the node, err: %v", uncordonErr) return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(uncordonErr).Error())}
} }
return errors.Errorf("application status check failed, err: %v", err) return err
} }
// Verify the status of Auxiliary Applications after reschedule // Verify the status of Auxiliary Applications after reschedule
@ -94,9 +104,9 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
if err = status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil { if err = status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
log.Info("[Revert]: Reverting chaos because auxiliary application status check failed") log.Info("[Revert]: Reverting chaos because auxiliary application status check failed")
if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil { if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil {
log.Errorf("Unable to uncordon the node, err: %v", uncordonErr) return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(uncordonErr).Error())}
} }
return errors.Errorf("auxiliary Applications status check failed, err: %v", err) return err
} }
} }
@ -108,7 +118,7 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
// Uncordon the application node // Uncordon the application node
if err := uncordonNode(experimentsDetails, clients, chaosDetails); err != nil { if err := uncordonNode(experimentsDetails, clients, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not uncordon the target node")
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -119,8 +129,10 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
return nil return nil
} }
// drainNode drain the application node // drainNode drain the target node
func drainNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error { func drainNode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeDrainFault")
defer span.End()
select { select {
case <-inject: case <-inject:
@ -130,12 +142,8 @@ func drainNode(experimentsDetails *experimentTypes.ExperimentDetails, clients cl
log.Infof("[Inject]: Draining the %v node", experimentsDetails.TargetNode) log.Infof("[Inject]: Draining the %v node", experimentsDetails.TargetNode)
command := exec.Command("kubectl", "drain", experimentsDetails.TargetNode, "--ignore-daemonsets", "--delete-emptydir-data", "--force", "--timeout", strconv.Itoa(experimentsDetails.ChaosDuration)+"s") command := exec.Command("kubectl", "drain", experimentsDetails.TargetNode, "--ignore-daemonsets", "--delete-emptydir-data", "--force", "--timeout", strconv.Itoa(experimentsDetails.ChaosDuration)+"s")
var out, stderr bytes.Buffer if err := common.RunCLICommands(command, "", fmt.Sprintf("{node: %s}", experimentsDetails.TargetNode), "failed to drain the target node", cerrors.ErrorTypeChaosInject); err != nil {
command.Stdout = &out return err
command.Stderr = &stderr
if err := command.Run(); err != nil {
log.Infof("Error String: %v", stderr.String())
return errors.Errorf("Unable to drain the %v node, err: %v", experimentsDetails.TargetNode, err)
} }
common.SetTargets(experimentsDetails.TargetNode, "injected", "node", chaosDetails) common.SetTargets(experimentsDetails.TargetNode, "injected", "node", chaosDetails)
@ -146,10 +154,10 @@ func drainNode(experimentsDetails *experimentTypes.ExperimentDetails, clients cl
Try(func(attempt uint) error { Try(func(attempt uint) error {
nodeSpec, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), experimentsDetails.TargetNode, v1.GetOptions{}) nodeSpec, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), experimentsDetails.TargetNode, v1.GetOptions{})
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{node: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
} }
if !nodeSpec.Spec.Unschedulable { if !nodeSpec.Spec.Unschedulable {
return errors.Errorf("%v node is not in unschedulable state", experimentsDetails.TargetNode) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{node: %s}", experimentsDetails.TargetNode), Reason: "node is not in unschedule state"}
} }
return nil return nil
}) })
@ -164,25 +172,21 @@ func uncordonNode(experimentsDetails *experimentTypes.ExperimentDetails, clients
for _, targetNode := range targetNodes { for _, targetNode := range targetNodes {
//Check node exist before uncordon the node //Check node exist before uncordon the node
_, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), targetNode, v1.GetOptions{}) _, err := clients.GetNode(targetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil { if err != nil {
if apierrors.IsNotFound(err) { if apierrors.IsNotFound(err) {
log.Infof("[Info]: The %v node is no longer exist, skip uncordon the node", targetNode) log.Infof("[Info]: The %v node is no longer exist, skip uncordon the node", targetNode)
common.SetTargets(targetNode, "noLongerExist", "node", chaosDetails) common.SetTargets(targetNode, "noLongerExist", "node", chaosDetails)
continue continue
} else { } else {
return errors.Errorf("unable to get the %v node, err: %v", targetNode, err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{node: %s}", targetNode), Reason: err.Error()}
} }
} }
log.Infof("[Recover]: Uncordon the %v node", targetNode) log.Infof("[Recover]: Uncordon the %v node", targetNode)
command := exec.Command("kubectl", "uncordon", targetNode) command := exec.Command("kubectl", "uncordon", targetNode)
var out, stderr bytes.Buffer if err := common.RunCLICommands(command, "", fmt.Sprintf("{node: %s}", targetNode), "failed to uncordon the target node", cerrors.ErrorTypeChaosInject); err != nil {
command.Stdout = &out return err
command.Stderr = &stderr
if err := command.Run(); err != nil {
log.Infof("Error String: %v", stderr.String())
return errors.Errorf("unable to uncordon the %v node, err: %v", targetNode, err)
} }
common.SetTargets(targetNode, "reverted", "node", chaosDetails) common.SetTargets(targetNode, "reverted", "node", chaosDetails)
} }
@ -198,11 +202,11 @@ func uncordonNode(experimentsDetails *experimentTypes.ExperimentDetails, clients
if apierrors.IsNotFound(err) { if apierrors.IsNotFound(err) {
continue continue
} else { } else {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{node: %s}", targetNode), Reason: err.Error()}
} }
} }
if nodeSpec.Spec.Unschedulable { if nodeSpec.Spec.Unschedulable {
return errors.Errorf("%v node is in unschedulable state", experimentsDetails.TargetNode) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{node: %s}", targetNode), Reason: "target node is in unschedule state"}
} }
} }
return nil return nil

View File

@ -2,10 +2,16 @@ package lib
import ( import (
"context" "context"
"fmt"
"strconv" "strconv"
"strings" "strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-io-stress/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-io-stress/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
@ -13,16 +19,17 @@ import (
"github.com/litmuschaos/litmus-go/pkg/status" "github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1" apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
) )
// PrepareNodeIOStress contains prepration steps before chaos injection // PrepareNodeIOStress contains preparation steps before chaos injection
func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareNodeIOStress(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeIOStressFault")
//setup the tunables if provided in range defer span.End()
//set up the tunables if provided in range
setChaosTunables(experimentsDetails) setChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The details of chaos tunables are:", logrus.Fields{ log.InfoWithValues("[Info]: The details of chaos tunables are:", logrus.Fields{
@ -30,7 +37,7 @@ func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails,
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage, "FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"CPU Core": experimentsDetails.CPU, "CPU Core": experimentsDetails.CPU,
"NumberOfWorkers": experimentsDetails.NumberOfWorkers, "NumberOfWorkers": experimentsDetails.NumberOfWorkers,
"Node Affce Perc": experimentsDetails.NodesAffectedPerc, "Node Affected Percentage": experimentsDetails.NodesAffectedPerc,
"Sequence": experimentsDetails.Sequence, "Sequence": experimentsDetails.Sequence,
}) })
@ -44,7 +51,7 @@ func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails,
nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc) nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc)
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients) targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get node list")
} }
log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{ log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{
"No. Of Nodes": len(targetNodeList), "No. Of Nodes": len(targetNodeList),
@ -53,21 +60,21 @@ func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails,
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil { if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err return stacktrace.Propagate(err, "could not set helper data")
} }
} }
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInSerialMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInParallelMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -79,13 +86,13 @@ func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails,
} }
// injectChaosInSerialMode stress the io of all the target nodes serially (one by one) // injectChaosInSerialMode stress the io of all the target nodes serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeIOStressFaultInSerialMode")
labelSuffix := common.GetRunID() defer span.End()
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -104,52 +111,45 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"NumberOfWorkers": experimentsDetails.NumberOfWorkers, "NumberOfWorkers": experimentsDetails.NumberOfWorkers,
}) })
experimentsDetails.RunID = common.GetRunID() experimentsDetails.RunID = stringutils.GetRunID()
// Creating the helper pod to perform node io stress // Creating the helper pod to perform node io stress
if err := createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix); err != nil { if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err) return stacktrace.Propagate(err, "could not create helper pod")
} }
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod //Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod") log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil { if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients) common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err) return stacktrace.Propagate(err, "could not check helper status")
}
common.SetTargets(appNode, "injected", "node", chaosDetails)
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
common.SetTargets(appNode, "reverted", "node", chaosDetails)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
} }
//Deleting the helper pod common.SetTargets(appNode, "targeted", "node", chaosDetails)
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil { if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err) return err
} }
} }
return nil return nil
} }
// injectChaosInParallelMode stress the io of all the target nodes in parallel mode (all at once) // injectChaosInParallelMode stress the io of all the target nodes in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeIOStressFaultInParallelMode")
labelSuffix := common.GetRunID() defer span.End()
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
experimentsDetails.RunID = stringutils.GetRunID()
for _, appNode := range targetNodeList { for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
@ -164,57 +164,37 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"NumberOfWorkers": experimentsDetails.NumberOfWorkers, "NumberOfWorkers": experimentsDetails.NumberOfWorkers,
}) })
experimentsDetails.RunID = common.GetRunID()
// Creating the helper pod to perform node io stress // Creating the helper pod to perform node io stress
if err := createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix); err != nil { if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err) return stacktrace.Propagate(err, "could not create helper pod")
} }
} }
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, appNode := range targetNodeList { for _, appNode := range targetNodeList {
common.SetTargets(appNode, "injected", "node", chaosDetails) common.SetTargets(appNode, "targeted", "node", chaosDetails)
} }
log.Info("[Wait]: Waiting till the completion of the helper pod") if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName) return err
for _, appNode := range targetNodeList {
common.SetTargets(appNode, "reverted", "node", chaosDetails)
}
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
} }
return nil return nil
} }
// createHelperPod derive the attributes for helper pod and create the helper pod // createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets, labelSuffix string) error { func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeIOStressFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds) terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{ helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{ ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID, GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace, Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, labelSuffix, experimentsDetails.ExperimentName), Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations, Annotations: chaosDetails.Annotations,
}, },
Spec: apiv1.PodSpec{ Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever, RestartPolicy: apiv1.RestartPolicyNever,
@ -236,8 +216,16 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
}, },
} }
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{}) if len(chaosDetails.SideCar) != 0 {
return err helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
} }
// getContainerArguments derives the args for the pumba stress helper pod // getContainerArguments derives the args for the pumba stress helper pod
@ -279,8 +267,8 @@ func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails
return stressArgs return stressArgs
} }
//setChaosTunables will setup a random value within a given range of values // setChaosTunables will set up a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value. // If the value is not provided in range it'll set up the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) { func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.FilesystemUtilizationBytes = common.ValidateRange(experimentsDetails.FilesystemUtilizationBytes) experimentsDetails.FilesystemUtilizationBytes = common.ValidateRange(experimentsDetails.FilesystemUtilizationBytes)
experimentsDetails.FilesystemUtilizationPercentage = common.ValidateRange(experimentsDetails.FilesystemUtilizationPercentage) experimentsDetails.FilesystemUtilizationPercentage = common.ValidateRange(experimentsDetails.FilesystemUtilizationPercentage)

View File

@ -2,34 +2,41 @@ package lib
import ( import (
"context" "context"
"fmt"
"strconv" "strconv"
"strings" "strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-memory-hog/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-memory-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1" apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
) )
// PrepareNodeMemoryHog contains prepration steps before chaos injection // PrepareNodeMemoryHog contains preparation steps before chaos injection
func PrepareNodeMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareNodeMemoryHog(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeMemoryHogFault")
defer span.End()
//setup the tunables if provided in range //set up the tunables if provided in range
setChaosTunables(experimentsDetails) setChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The details of chaos tunables are:", logrus.Fields{ log.InfoWithValues("[Info]: The details of chaos tunables are:", logrus.Fields{
"MemoryConsumptionMebibytes": experimentsDetails.MemoryConsumptionMebibytes, "MemoryConsumptionMebibytes": experimentsDetails.MemoryConsumptionMebibytes,
"MemoryConsumptionPercentage": experimentsDetails.MemoryConsumptionPercentage, "MemoryConsumptionPercentage": experimentsDetails.MemoryConsumptionPercentage,
"NumberOfWorkers": experimentsDetails.NumberOfWorkers, "NumberOfWorkers": experimentsDetails.NumberOfWorkers,
"Node Affce Perc": experimentsDetails.NodesAffectedPerc, "Node Affected Percentage": experimentsDetails.NodesAffectedPerc,
"Sequence": experimentsDetails.Sequence, "Sequence": experimentsDetails.Sequence,
}) })
@ -43,8 +50,9 @@ func PrepareNodeMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails,
nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc) nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc)
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients) targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get node list")
} }
log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{ log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{
"No. Of Nodes": len(targetNodeList), "No. Of Nodes": len(targetNodeList),
"Node Names": targetNodeList, "Node Names": targetNodeList,
@ -52,21 +60,21 @@ func PrepareNodeMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails,
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil { if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err return stacktrace.Propagate(err, "could not set helper data")
} }
} }
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInSerialMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInParallelMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -78,13 +86,13 @@ func PrepareNodeMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails,
} }
// injectChaosInSerialMode stress the memory of all the target nodes serially (one by one) // injectChaosInSerialMode stress the memory of all the target nodes serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeMemoryHogFaultInSerialMode")
labelSuffix := common.GetRunID() defer span.End()
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -103,68 +111,50 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"Memory Consumption Mebibytes": experimentsDetails.MemoryConsumptionMebibytes, "Memory Consumption Mebibytes": experimentsDetails.MemoryConsumptionMebibytes,
}) })
experimentsDetails.RunID = common.GetRunID() experimentsDetails.RunID = stringutils.GetRunID()
//Getting node memory details //Getting node memory details
memoryCapacity, memoryAllocatable, err := getNodeMemoryDetails(appNode, clients) memoryCapacity, memoryAllocatable, err := getNodeMemoryDetails(appNode, clients)
if err != nil { if err != nil {
return errors.Errorf("unable to get the node memory details, err: %v", err) return stacktrace.Propagate(err, "could not get node memory details")
} }
//Getting the exact memory value to exhaust //Getting the exact memory value to exhaust
MemoryConsumption, err := calculateMemoryConsumption(experimentsDetails, clients, memoryCapacity, memoryAllocatable) MemoryConsumption, err := calculateMemoryConsumption(experimentsDetails, memoryCapacity, memoryAllocatable)
if err != nil { if err != nil {
return errors.Errorf("memory calculation failed, err: %v", err) return stacktrace.Propagate(err, "could not calculate memory consumption value")
} }
// Creating the helper pod to perform node memory hog // Creating the helper pod to perform node memory hog
if err = createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix, MemoryConsumption); err != nil { if err = createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients, MemoryConsumption); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err) return stacktrace.Propagate(err, "could not create helper pod")
} }
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(appNode, "targeted", "node", chaosDetails) common.SetTargets(appNode, "targeted", "node", chaosDetails)
// Wait till the completion of helper pod if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
log.Info("[Wait]: Waiting till the completion of the helper pod") return err
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
} else if podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod status is %v", podStatus)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
} }
} }
return nil return nil
} }
// injectChaosInParallelMode stress the memory all the target nodes in parallel mode (all at once) // injectChaosInParallelMode stress the memory all the target nodes in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeMemoryHogFaultInParallelMode")
labelSuffix := common.GetRunID() defer span.End()
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
experimentsDetails.RunID = stringutils.GetRunID()
for _, appNode := range targetNodeList { for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
@ -179,54 +169,32 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"Memory Consumption Mebibytes": experimentsDetails.MemoryConsumptionMebibytes, "Memory Consumption Mebibytes": experimentsDetails.MemoryConsumptionMebibytes,
}) })
experimentsDetails.RunID = common.GetRunID()
//Getting node memory details //Getting node memory details
memoryCapacity, memoryAllocatable, err := getNodeMemoryDetails(appNode, clients) memoryCapacity, memoryAllocatable, err := getNodeMemoryDetails(appNode, clients)
if err != nil { if err != nil {
return errors.Errorf("unable to get the node memory details, err: %v", err) return stacktrace.Propagate(err, "could not get node memory details")
} }
//Getting the exact memory value to exhaust //Getting the exact memory value to exhaust
MemoryConsumption, err := calculateMemoryConsumption(experimentsDetails, clients, memoryCapacity, memoryAllocatable) MemoryConsumption, err := calculateMemoryConsumption(experimentsDetails, memoryCapacity, memoryAllocatable)
if err != nil { if err != nil {
return errors.Errorf("memory calculation failed, err: %v", err) return stacktrace.Propagate(err, "could not calculate memory consumption value")
} }
// Creating the helper pod to perform node memory hog // Creating the helper pod to perform node memory hog
if err = createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix, MemoryConsumption); err != nil { if err = createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients, MemoryConsumption); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err) return stacktrace.Propagate(err, "could not create helper pod")
} }
} }
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, appNode := range targetNodeList { for _, appNode := range targetNodeList {
common.SetTargets(appNode, "targeted", "node", chaosDetails) common.SetTargets(appNode, "targeted", "node", chaosDetails)
} }
// Wait till the completion of helper pod if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
log.Info("[Wait]: Waiting till the completion of the helper pod") return err
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
} else if podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod status is %v", podStatus)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
} }
return nil return nil
@ -234,25 +202,23 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
// getNodeMemoryDetails will return the total memory capacity and memory allocatable of an application node // getNodeMemoryDetails will return the total memory capacity and memory allocatable of an application node
func getNodeMemoryDetails(appNodeName string, clients clients.ClientSets) (int, int, error) { func getNodeMemoryDetails(appNodeName string, clients clients.ClientSets) (int, int, error) {
nodeDetails, err := clients.GetNode(appNodeName, 180, 2)
nodeDetails, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), appNodeName, v1.GetOptions{})
if err != nil { if err != nil {
return 0, 0, err return 0, 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNodeName), Reason: err.Error()}
} }
memoryCapacity := int(nodeDetails.Status.Capacity.Memory().Value()) memoryCapacity := int(nodeDetails.Status.Capacity.Memory().Value())
memoryAllocatable := int(nodeDetails.Status.Allocatable.Memory().Value()) memoryAllocatable := int(nodeDetails.Status.Allocatable.Memory().Value())
if memoryCapacity == 0 || memoryAllocatable == 0 { if memoryCapacity == 0 || memoryAllocatable == 0 {
return memoryCapacity, memoryAllocatable, errors.Errorf("failed to get memory details of the application node") return memoryCapacity, memoryAllocatable, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNodeName), Reason: "failed to get memory details of the target node"}
} }
return memoryCapacity, memoryAllocatable, nil return memoryCapacity, memoryAllocatable, nil
} }
// calculateMemoryConsumption will calculate the amount of memory to be consumed for a given unit. // calculateMemoryConsumption will calculate the amount of memory to be consumed for a given unit.
func calculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, memoryCapacity, memoryAllocatable int) (string, error) { func calculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDetails, memoryCapacity, memoryAllocatable int) (string, error) {
var totalMemoryConsumption int var totalMemoryConsumption int
var MemoryConsumption string var MemoryConsumption string
@ -279,12 +245,12 @@ func calculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDe
//Getting the total memory under chaos //Getting the total memory under chaos
memoryConsumptionPercentage, _ := strconv.ParseFloat(experimentsDetails.MemoryConsumptionPercentage, 64) memoryConsumptionPercentage, _ := strconv.ParseFloat(experimentsDetails.MemoryConsumptionPercentage, 64)
memoryForChaos := ((memoryConsumptionPercentage / 100) * float64(memoryCapacity)) memoryForChaos := (memoryConsumptionPercentage / 100) * float64(memoryCapacity)
//Get the percentage of memory under chaos wrt allocatable memory //Get the percentage of memory under chaos wrt allocatable memory
totalMemoryConsumption = int((float64(memoryForChaos) / float64(memoryAllocatable)) * 100) totalMemoryConsumption = int((memoryForChaos / float64(memoryAllocatable)) * 100)
if totalMemoryConsumption > 100 { if totalMemoryConsumption > 100 {
log.Infof("[Info]: PercentageOfMemoryCapacity To Be Used: %d percent, which is more than 100 percent (%d percent) of Allocatable Memory, so the experiment will only consume upto 100 percent of Allocatable Memory", experimentsDetails.MemoryConsumptionPercentage, totalMemoryConsumption) log.Infof("[Info]: PercentageOfMemoryCapacity To Be Used: %v percent, which is more than 100 percent (%d percent) of Allocatable Memory, so the experiment will only consume upto 100 percent of Allocatable Memory", experimentsDetails.MemoryConsumptionPercentage, totalMemoryConsumption)
MemoryConsumption = "100%" MemoryConsumption = "100%"
} else { } else {
log.Infof("[Info]: PercentageOfMemoryCapacity To Be Used: %v percent, which is %d percent of Allocatable Memory", experimentsDetails.MemoryConsumptionPercentage, totalMemoryConsumption) log.Infof("[Info]: PercentageOfMemoryCapacity To Be Used: %v percent, which is %d percent of Allocatable Memory", experimentsDetails.MemoryConsumptionPercentage, totalMemoryConsumption)
@ -310,20 +276,22 @@ func calculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDe
} }
return MemoryConsumption, nil return MemoryConsumption, nil
} }
return "", errors.Errorf("please specify the memory consumption value either in percentage or mebibytes in a non-decimal format using respective envs") return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: "specify the memory consumption value either in percentage or mebibytes in a non-decimal format using respective envs"}
} }
// createHelperPod derive the attributes for helper pod and create the helper pod // createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets, labelSuffix, MemoryConsumption string) error { func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets, MemoryConsumption string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeMemoryHogFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds) terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{ helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{ ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID, GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace, Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, labelSuffix, experimentsDetails.ExperimentName), Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations, Annotations: chaosDetails.Annotations,
}, },
Spec: apiv1.PodSpec{ Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever, RestartPolicy: apiv1.RestartPolicyNever,
@ -352,12 +320,20 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
}, },
} }
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{}) if len(chaosDetails.SideCar) != 0 {
return err helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
} }
//setChaosTunables will setup a random value within a given range of values // setChaosTunables will set up a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value. // If the value is not provided in range it'll set up the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) { func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.MemoryConsumptionMebibytes = common.ValidateRange(experimentsDetails.MemoryConsumptionMebibytes) experimentsDetails.MemoryConsumptionMebibytes = common.ValidateRange(experimentsDetails.MemoryConsumptionMebibytes)
experimentsDetails.MemoryConsumptionPercentage = common.ValidateRange(experimentsDetails.MemoryConsumptionPercentage) experimentsDetails.MemoryConsumptionPercentage = common.ValidateRange(experimentsDetails.MemoryConsumptionPercentage)

View File

@ -6,19 +6,21 @@ import (
"strconv" "strconv"
"strings" "strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-restart/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-restart/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1" apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
corev1 "k8s.io/kubernetes/pkg/apis/core"
) )
var err error var err error
@ -32,17 +34,20 @@ const (
privateKeySecret string = "private-key-cm-" privateKeySecret string = "private-key-cm-"
emptyDirVolume string = "empty-dir-" emptyDirVolume string = "empty-dir-"
ObjectNameField = "metadata.name"
) )
// PrepareNodeRestart contains preparation steps before chaos injection // PrepareNodeRestart contains preparation steps before chaos injection
func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareNodeRestart(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeRestartFault")
defer span.End()
//Select the node //Select the node
if experimentsDetails.TargetNode == "" { if experimentsDetails.TargetNode == "" {
//Select node for node-restart //Select node for node-restart
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients) experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get node name")
} }
} }
@ -50,7 +55,7 @@ func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, c
if experimentsDetails.TargetNodeIP == "" { if experimentsDetails.TargetNodeIP == "" {
experimentsDetails.TargetNodeIP, err = getInternalIP(experimentsDetails.TargetNode, clients) experimentsDetails.TargetNodeIP, err = getInternalIP(experimentsDetails.TargetNode, clients)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get internal ip")
} }
} }
@ -59,8 +64,7 @@ func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, c
"Target Node IP": experimentsDetails.TargetNodeIP, "Target Node IP": experimentsDetails.TargetNodeIP,
}) })
experimentsDetails.RunID = common.GetRunID() experimentsDetails.RunID = stringutils.GetRunID()
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID
//Waiting for the ramp time before chaos injection //Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
@ -79,39 +83,19 @@ func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, c
} }
// Creating the helper pod to perform node restart // Creating the helper pod to perform node restart
if err = createHelperPod(experimentsDetails, chaosDetails, clients); err != nil { if err = createHelperPod(ctx, experimentsDetails, chaosDetails, clients); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err) return stacktrace.Propagate(err, "could not create helper pod")
} }
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod //Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod") if err := common.CheckHelperStatusAndRunProbes(ctx, appLabel, experimentsDetails.TargetNode, chaosDetails, clients, resultDetails, eventsDetails); err != nil {
if err = status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil { return err
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
} }
common.SetTargets(experimentsDetails.TargetNode, "targeted", "node", chaosDetails) if err := common.WaitForCompletionAndDeleteHelperPods(appLabel, chaosDetails, clients, false); err != nil {
return err
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return err
}
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -119,14 +103,17 @@ func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, c
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", strconv.Itoa(experimentsDetails.RampTime)) log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", strconv.Itoa(experimentsDetails.RampTime))
common.WaitForDuration(experimentsDetails.RampTime) common.WaitForDuration(experimentsDetails.RampTime)
} }
return nil return nil
} }
// createHelperPod derive the attributes for helper pod and create the helper pod // createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, clients clients.ClientSets) error { func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, clients clients.ClientSets) error {
// This method is attaching emptyDir along with secret volume, and copy data from secret // This method is attaching emptyDir along with secret volume, and copy data from secret
// to the emptyDir, because secret is mounted as readonly and with 777 perms and it can't be changed // to the emptyDir, because secret is mounted as readonly and with 777 perms and it can't be changed
// because of: https://github.com/kubernetes/kubernetes/issues/57923 // because of: https://github.com/kubernetes/kubernetes/issues/57923
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeRestartFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds) terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
@ -134,7 +121,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
ObjectMeta: v1.ObjectMeta{ ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID, Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace, Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, "", experimentsDetails.ExperimentName), Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations, Annotations: chaosDetails.Annotations,
}, },
Spec: apiv1.PodSpec{ Spec: apiv1.PodSpec{
@ -148,7 +135,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
{ {
MatchFields: []apiv1.NodeSelectorRequirement{ MatchFields: []apiv1.NodeSelectorRequirement{
{ {
Key: corev1.ObjectNameField, Key: ObjectNameField,
Operator: apiv1.NodeSelectorOpNotIn, Operator: apiv1.NodeSelectorOpNotIn,
Values: []string{experimentsDetails.TargetNode}, Values: []string{experimentsDetails.TargetNode},
}, },
@ -199,20 +186,28 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
}, },
} }
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{}) if len(chaosDetails.SideCar) != 0 {
return err helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
} }
// getInternalIP gets the internal ip of the given node // getInternalIP gets the internal ip of the given node
func getInternalIP(nodeName string, clients clients.ClientSets) (string, error) { func getInternalIP(nodeName string, clients clients.ClientSets) (string, error) {
node, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), nodeName, v1.GetOptions{}) node, err := clients.GetNode(nodeName, 180, 2)
if err != nil { if err != nil {
return "", err return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", nodeName), Reason: err.Error()}
} }
for _, addr := range node.Status.Addresses { for _, addr := range node.Status.Addresses {
if strings.ToLower(string(addr.Type)) == "internalip" { if strings.ToLower(string(addr.Type)) == "internalip" {
return addr.Address, nil return addr.Address, nil
} }
} }
return "", errors.Errorf("unable to find the internal ip of the %v node", nodeName) return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", nodeName), Reason: "failed to get the internal ip of the target node"}
} }

View File

@ -2,13 +2,19 @@ package lib
import ( import (
"context" "context"
"fmt"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
"syscall" "syscall"
"time" "time"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-taint/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-taint/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
@ -16,9 +22,7 @@ import (
"github.com/litmuschaos/litmus-go/pkg/status" "github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
apiv1 "k8s.io/api/core/v1" apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
) )
var ( var (
@ -26,8 +30,10 @@ var (
inject, abort chan os.Signal inject, abort chan os.Signal
) )
//PrepareNodeTaint contains the prepration steps before chaos injection // PrepareNodeTaint contains the preparation steps before chaos injection
func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareNodeTaint(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeTaintFault")
defer span.End()
// inject channel is used to transmit signal notifications. // inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
@ -49,7 +55,7 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
//Select node for kubelet-service-kill //Select node for kubelet-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients) experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get node name")
} }
} }
@ -61,7 +67,7 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -70,21 +76,28 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
go abortWatcher(experimentsDetails, clients, resultDetails, chaosDetails, eventsDetails) go abortWatcher(experimentsDetails, clients, resultDetails, chaosDetails, eventsDetails)
// taint the application node // taint the application node
if err := taintNode(experimentsDetails, clients, chaosDetails); err != nil { if err := taintNode(ctx, experimentsDetails, clients, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not taint node")
} }
// Verify the status of AUT after reschedule // Verify the status of AUT after reschedule
log.Info("[Status]: Verify the status of AUT after reschedule") log.Info("[Status]: Verify the status of AUT after reschedule")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil { if err = status.AUTStatusCheck(clients, chaosDetails); err != nil {
return errors.Errorf("application status check failed, err: %v", err) log.Info("[Revert]: Reverting chaos because application status check failed")
if taintErr := removeTaintFromNode(experimentsDetails, clients, chaosDetails); taintErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(taintErr).Error())}
}
return err
} }
// Verify the status of Auxiliary Applications after reschedule
if experimentsDetails.AuxiliaryAppInfo != "" { if experimentsDetails.AuxiliaryAppInfo != "" {
log.Info("[Status]: Verify that the Auxiliary Applications are running") log.Info("[Status]: Verify that the Auxiliary Applications are running")
if err = status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil { if err = status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return errors.Errorf("auxiliary Applications status check failed, err: %v", err) log.Info("[Revert]: Reverting chaos because auxiliary application status check failed")
if taintErr := removeTaintFromNode(experimentsDetails, clients, chaosDetails); taintErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(taintErr).Error())}
}
return err
} }
} }
@ -96,7 +109,7 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
// remove taint from the application node // remove taint from the application node
if err := removeTaintFromNode(experimentsDetails, clients, chaosDetails); err != nil { if err := removeTaintFromNode(experimentsDetails, clients, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not remove taint from node")
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -108,7 +121,9 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
} }
// taintNode taint the application node // taintNode taint the application node
func taintNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error { func taintNode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeTaintFault")
defer span.End()
// get the taint labels & effect // get the taint labels & effect
taintKey, taintValue, taintEffect := getTaintDetails(experimentsDetails) taintKey, taintValue, taintEffect := getTaintDetails(experimentsDetails)
@ -116,9 +131,9 @@ func taintNode(experimentsDetails *experimentTypes.ExperimentDetails, clients cl
log.Infof("Add %v taints to the %v node", taintKey+"="+taintValue+":"+taintEffect, experimentsDetails.TargetNode) log.Infof("Add %v taints to the %v node", taintKey+"="+taintValue+":"+taintEffect, experimentsDetails.TargetNode)
// get the node details // get the node details
node, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), experimentsDetails.TargetNode, v1.GetOptions{}) node, err := clients.GetNode(experimentsDetails.TargetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil || node == nil { if err != nil {
return errors.Errorf("failed to get %v node, err: %v", experimentsDetails.TargetNode, err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{nodeName: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
} }
// check if the taint already exists // check if the taint already exists
@ -142,9 +157,8 @@ func taintNode(experimentsDetails *experimentTypes.ExperimentDetails, clients cl
Effect: apiv1.TaintEffect(taintEffect), Effect: apiv1.TaintEffect(taintEffect),
}) })
updatedNodeWithTaint, err := clients.KubeClient.CoreV1().Nodes().Update(context.Background(), node, v1.UpdateOptions{}) if err := clients.UpdateNode(chaosDetails, node); err != nil {
if err != nil || updatedNodeWithTaint == nil { return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{nodeName: %s}", node.Name), Reason: fmt.Sprintf("failed to add taints: %s", err.Error())}
return errors.Errorf("failed to update %v node after adding taints, err: %v", experimentsDetails.TargetNode, err)
} }
} }
@ -163,9 +177,9 @@ func removeTaintFromNode(experimentsDetails *experimentTypes.ExperimentDetails,
taintKey := strings.Split(taintLabel[0], "=")[0] taintKey := strings.Split(taintLabel[0], "=")[0]
// get the node details // get the node details
node, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), experimentsDetails.TargetNode, v1.GetOptions{}) node, err := clients.GetNode(experimentsDetails.TargetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil || node == nil { if err != nil {
return errors.Errorf("failed to get %v node, err: %v", experimentsDetails.TargetNode, err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{nodeName: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
} }
// check if the taint already exists // check if the taint already exists
@ -178,17 +192,16 @@ func removeTaintFromNode(experimentsDetails *experimentTypes.ExperimentDetails,
} }
if tainted { if tainted {
var Newtaints []apiv1.Taint var newTaints []apiv1.Taint
// remove all the taints with matching key // remove all the taints with matching key
for _, taint := range node.Spec.Taints { for _, taint := range node.Spec.Taints {
if taint.Key != taintKey { if taint.Key != taintKey {
Newtaints = append(Newtaints, taint) newTaints = append(newTaints, taint)
} }
} }
node.Spec.Taints = Newtaints node.Spec.Taints = newTaints
updatedNodeWithTaint, err := clients.KubeClient.CoreV1().Nodes().Update(context.Background(), node, v1.UpdateOptions{}) if err := clients.UpdateNode(chaosDetails, node); err != nil {
if err != nil || updatedNodeWithTaint == nil { return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{nodeName: %s}", node.Name), Reason: fmt.Sprintf("failed to remove taints: %s", err.Error())}
return errors.Errorf("failed to update %v node after removing taints, err: %v", experimentsDetails.TargetNode, err)
} }
} }

View File

@ -2,16 +2,22 @@ package lib
import ( import (
"context" "context"
"math" "fmt"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
"syscall" "syscall"
"time" "time"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-autoscaler/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-autoscaler/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/math"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
@ -20,8 +26,6 @@ import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
appsv1 "k8s.io/client-go/kubernetes/typed/apps/v1" appsv1 "k8s.io/client-go/kubernetes/typed/apps/v1"
retries "k8s.io/client-go/util/retry" retries "k8s.io/client-go/util/retry"
"github.com/pkg/errors"
) )
var ( var (
@ -30,8 +34,10 @@ var (
appsv1StatefulsetClient appsv1.StatefulSetInterface appsv1StatefulsetClient appsv1.StatefulSetInterface
) )
//PreparePodAutoscaler contains the prepration steps and chaos injection steps // PreparePodAutoscaler contains the preparation steps and chaos injection steps
func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PreparePodAutoscaler(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodAutoscalerFault")
defer span.End()
//Waiting for the ramp time before chaos injection //Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
@ -46,9 +52,9 @@ func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails,
switch strings.ToLower(experimentsDetails.AppKind) { switch strings.ToLower(experimentsDetails.AppKind) {
case "deployment", "deployments": case "deployment", "deployments":
appsUnderTest, err := getDeploymentDetails(experimentsDetails, clients) appsUnderTest, err := getDeploymentDetails(experimentsDetails)
if err != nil { if err != nil {
return errors.Errorf("fail to get the name & initial replica count of the deployment, err: %v", err) return stacktrace.Propagate(err, "could not get deployment details")
} }
deploymentList := []string{} deploymentList := []string{}
@ -63,22 +69,22 @@ func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails,
//calling go routine which will continuously watch for the abort signal //calling go routine which will continuously watch for the abort signal
go abortPodAutoScalerChaos(appsUnderTest, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails) go abortPodAutoScalerChaos(appsUnderTest, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails)
if err = podAutoscalerChaosInDeployment(experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil { if err = podAutoscalerChaosInDeployment(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return errors.Errorf("fail to perform autoscaling, err: %v", err) return stacktrace.Propagate(err, "could not scale deployment")
} }
if err = autoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil { if err = autoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
return errors.Errorf("fail to rollback the autoscaling, err: %v", err) return stacktrace.Propagate(err, "could not revert scaling in deployment")
} }
case "statefulset", "statefulsets": case "statefulset", "statefulsets":
appsUnderTest, err := getStatefulsetDetails(experimentsDetails, clients) appsUnderTest, err := getStatefulsetDetails(experimentsDetails)
if err != nil { if err != nil {
return errors.Errorf("fail to get the name & initial replica count of the statefulset, err: %v", err) return stacktrace.Propagate(err, "could not get statefulset details")
} }
stsList := []string{} var stsList []string
for _, sts := range appsUnderTest { for _, sts := range appsUnderTest {
stsList = append(stsList, sts.AppName) stsList = append(stsList, sts.AppName)
} }
@ -90,16 +96,16 @@ func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails,
//calling go routine which will continuously watch for the abort signal //calling go routine which will continuously watch for the abort signal
go abortPodAutoScalerChaos(appsUnderTest, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails) go abortPodAutoScalerChaos(appsUnderTest, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails)
if err = podAutoscalerChaosInStatefulset(experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil { if err = podAutoscalerChaosInStatefulset(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return errors.Errorf("fail to perform autoscaling, err: %v", err) return stacktrace.Propagate(err, "could not scale statefulset")
} }
if err = autoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil { if err = autoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
return errors.Errorf("fail to rollback the autoscaling, err: %v", err) return stacktrace.Propagate(err, "could not revert scaling in statefulset")
} }
default: default:
return errors.Errorf("application type '%s' is not supported for the chaos", experimentsDetails.AppKind) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{kind: %s}", experimentsDetails.AppKind), Reason: "application type is not supported"}
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -110,38 +116,38 @@ func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails,
return nil return nil
} }
func getSliceOfTotalApplicationsTargeted(appList []experimentTypes.ApplicationUnderTest, experimentsDetails *experimentTypes.ExperimentDetails) ([]experimentTypes.ApplicationUnderTest, error) { func getSliceOfTotalApplicationsTargeted(appList []experimentTypes.ApplicationUnderTest, experimentsDetails *experimentTypes.ExperimentDetails) []experimentTypes.ApplicationUnderTest {
slice := int(math.Round(float64(len(appList)*experimentsDetails.AppAffectPercentage) / float64(100))) newAppListLength := math.Maximum(1, math.Adjustment(math.Minimum(experimentsDetails.AppAffectPercentage, 100), len(appList)))
if slice < 0 || slice > len(appList) { return appList[:newAppListLength]
return nil, errors.Errorf("slice of applications to target out of range %d/%d", slice, len(appList))
}
return appList[:slice], nil
} }
//getDeploymentDetails is used to get the name and total number of replicas of the deployment // getDeploymentDetails is used to get the name and total number of replicas of the deployment
func getDeploymentDetails(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) ([]experimentTypes.ApplicationUnderTest, error) { func getDeploymentDetails(experimentsDetails *experimentTypes.ExperimentDetails) ([]experimentTypes.ApplicationUnderTest, error) {
deploymentList, err := appsv1DeploymentClient.List(context.Background(), metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel}) deploymentList, err := appsv1DeploymentClient.List(context.Background(), metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
if err != nil || len(deploymentList.Items) == 0 { if err != nil {
return nil, errors.Errorf("fail to get the deployments with matching labels, err: %v", err) return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: deployment, labels: %s}", experimentsDetails.AppLabel), Reason: err.Error()}
} else if len(deploymentList.Items) == 0 {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: deployment, labels: %s}", experimentsDetails.AppLabel), Reason: "no deployment found with matching labels"}
} }
appsUnderTest := []experimentTypes.ApplicationUnderTest{} var appsUnderTest []experimentTypes.ApplicationUnderTest
for _, app := range deploymentList.Items { for _, app := range deploymentList.Items {
log.Infof("[Info]: Found deployment name '%s' with replica count '%d'", app.Name, int(*app.Spec.Replicas)) log.Infof("[Info]: Found deployment name '%s' with replica count '%d'", app.Name, int(*app.Spec.Replicas))
appsUnderTest = append(appsUnderTest, experimentTypes.ApplicationUnderTest{AppName: app.Name, ReplicaCount: int(*app.Spec.Replicas)}) appsUnderTest = append(appsUnderTest, experimentTypes.ApplicationUnderTest{AppName: app.Name, ReplicaCount: int(*app.Spec.Replicas)})
} }
// Applying the APP_AFFECT_PERC variable to determine the total target deployments to scale // Applying the APP_AFFECTED_PERC variable to determine the total target deployments to scale
return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails) return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails), nil
} }
//getStatefulsetDetails is used to get the name and total number of replicas of the statefulsets // getStatefulsetDetails is used to get the name and total number of replicas of the statefulsets
func getStatefulsetDetails(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) ([]experimentTypes.ApplicationUnderTest, error) { func getStatefulsetDetails(experimentsDetails *experimentTypes.ExperimentDetails) ([]experimentTypes.ApplicationUnderTest, error) {
statefulsetList, err := appsv1StatefulsetClient.List(context.Background(), metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel}) statefulsetList, err := appsv1StatefulsetClient.List(context.Background(), metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
if err != nil || len(statefulsetList.Items) == 0 { if err != nil {
return nil, errors.Errorf("fail to get the statefulsets with matching labels, err: %v", err) return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: statefulset, labels: %s}", experimentsDetails.AppLabel), Reason: err.Error()}
} else if len(statefulsetList.Items) == 0 {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: statefulset, labels: %s}", experimentsDetails.AppLabel), Reason: "no statefulset found with matching labels"}
} }
appsUnderTest := []experimentTypes.ApplicationUnderTest{} appsUnderTest := []experimentTypes.ApplicationUnderTest{}
@ -150,11 +156,11 @@ func getStatefulsetDetails(experimentsDetails *experimentTypes.ExperimentDetails
appsUnderTest = append(appsUnderTest, experimentTypes.ApplicationUnderTest{AppName: app.Name, ReplicaCount: int(*app.Spec.Replicas)}) appsUnderTest = append(appsUnderTest, experimentTypes.ApplicationUnderTest{AppName: app.Name, ReplicaCount: int(*app.Spec.Replicas)})
} }
// Applying the APP_AFFECT_PERC variable to determine the total target deployments to scale // Applying the APP_AFFECT_PERC variable to determine the total target deployments to scale
return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails) return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails), nil
} }
//podAutoscalerChaosInDeployment scales up the replicas of deployment and verify the status // podAutoscalerChaosInDeployment scales up the replicas of deployment and verify the status
func podAutoscalerChaosInDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func podAutoscalerChaosInDeployment(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Scale Application // Scale Application
retryErr := retries.RetryOnConflict(retries.DefaultRetry, func() error { retryErr := retries.RetryOnConflict(retries.DefaultRetry, func() error {
@ -163,33 +169,29 @@ func podAutoscalerChaosInDeployment(experimentsDetails *experimentTypes.Experime
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver // RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
appUnderTest, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{}) appUnderTest, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil { if err != nil {
return errors.Errorf("fail to get latest version of application deployment, err: %v", err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: err.Error()}
} }
// modifying the replica count // modifying the replica count
appUnderTest.Spec.Replicas = int32Ptr(int32(experimentsDetails.Replicas)) appUnderTest.Spec.Replicas = int32Ptr(int32(experimentsDetails.Replicas))
log.Infof("Updating deployment '%s' to number of replicas '%d'", appUnderTest.ObjectMeta.Name, experimentsDetails.Replicas) log.Infof("Updating deployment '%s' to number of replicas '%d'", appUnderTest.ObjectMeta.Name, experimentsDetails.Replicas)
_, err = appsv1DeploymentClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{}) _, err = appsv1DeploymentClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to scale deployment :%s", err.Error())}
} }
common.SetTargets(app.AppName, "injected", "deployment", chaosDetails) common.SetTargets(app.AppName, "injected", "deployment", chaosDetails)
} }
return nil return nil
}) })
if retryErr != nil { if retryErr != nil {
return errors.Errorf("fail to update the replica count of the deployment, err: %v", retryErr) return retryErr
} }
log.Info("[Info]: The application started scaling") log.Info("[Info]: The application started scaling")
if err = deploymentStatusCheck(experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil { return deploymentStatusCheck(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails)
return errors.Errorf("application deployment status check failed, err: %v", err)
}
return nil
} }
//podAutoscalerChaosInStatefulset scales up the replicas of statefulset and verify the status // podAutoscalerChaosInStatefulset scales up the replicas of statefulset and verify the status
func podAutoscalerChaosInStatefulset(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func podAutoscalerChaosInStatefulset(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Scale Application // Scale Application
retryErr := retries.RetryOnConflict(retries.DefaultRetry, func() error { retryErr := retries.RetryOnConflict(retries.DefaultRetry, func() error {
@ -198,36 +200,31 @@ func podAutoscalerChaosInStatefulset(experimentsDetails *experimentTypes.Experim
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver // RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
appUnderTest, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{}) appUnderTest, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil { if err != nil {
return errors.Errorf("fail to get latest version of the target statefulset application , err: %v", err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: err.Error()}
} }
// modifying the replica count // modifying the replica count
appUnderTest.Spec.Replicas = int32Ptr(int32(experimentsDetails.Replicas)) appUnderTest.Spec.Replicas = int32Ptr(int32(experimentsDetails.Replicas))
_, err = appsv1StatefulsetClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{}) _, err = appsv1StatefulsetClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to scale statefulset :%s", err.Error())}
} }
common.SetTargets(app.AppName, "injected", "statefulset", chaosDetails) common.SetTargets(app.AppName, "injected", "statefulset", chaosDetails)
} }
return nil return nil
}) })
if retryErr != nil { if retryErr != nil {
return errors.Errorf("fail to update the replica count of the statefulset application, err: %v", retryErr) return retryErr
} }
log.Info("[Info]: The application started scaling") log.Info("[Info]: The application started scaling")
if err = statefulsetStatusCheck(experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil { return statefulsetStatusCheck(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails)
return errors.Errorf("statefulset application status check failed, err: %v", err)
}
return nil
} }
// deploymentStatusCheck check the status of deployment and verify the available replicas // deploymentStatusCheck check the status of deployment and verify the available replicas
func deploymentStatusCheck(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func deploymentStatusCheck(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin //ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now() ChaosStartTimeStamp := time.Now()
isFailed := false
err = retry. err = retry.
Times(uint(experimentsDetails.ChaosDuration / experimentsDetails.Delay)). Times(uint(experimentsDetails.ChaosDuration / experimentsDetails.Delay)).
@ -236,33 +233,29 @@ func deploymentStatusCheck(experimentsDetails *experimentTypes.ExperimentDetails
for _, app := range appsUnderTest { for _, app := range appsUnderTest {
deployment, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{}) deployment, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil { if err != nil {
return errors.Errorf("fail to find the deployment with name %v, err: %v", app.AppName, err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
} }
if int(deployment.Status.ReadyReplicas) != experimentsDetails.Replicas { if int(deployment.Status.ReadyReplicas) != experimentsDetails.Replicas {
isFailed = true return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to scale deployment, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, deployment.Status.ReadyReplicas)}
return errors.Errorf("application %s is not scaled yet, the desired replica count is: %v and ready replica count is: %v", app.AppName, experimentsDetails.Replicas, deployment.Status.ReadyReplicas)
} }
} }
isFailed = false
return nil return nil
}) })
if isFailed {
if err = autoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
return errors.Errorf("fail to perform the autoscaler recovery of the deployment, err: %v", err)
}
return errors.Errorf("fail to scale the deployment to the desired replica count in the given chaos duration")
}
if err != nil { if err != nil {
return err if scaleErr := autoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest, chaosDetails); scaleErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(scaleErr).Error())}
}
return stacktrace.Propagate(err, "failed to scale replicas")
} }
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
duration := int(time.Since(ChaosStartTimeStamp).Seconds()) duration := int(time.Since(ChaosStartTimeStamp).Seconds())
if duration < experimentsDetails.ChaosDuration { if duration < experimentsDetails.ChaosDuration {
log.Info("[Wait]: Waiting for completion of chaos duration") log.Info("[Wait]: Waiting for completion of chaos duration")
@ -273,11 +266,10 @@ func deploymentStatusCheck(experimentsDetails *experimentTypes.ExperimentDetails
} }
// statefulsetStatusCheck check the status of statefulset and verify the available replicas // statefulsetStatusCheck check the status of statefulset and verify the available replicas
func statefulsetStatusCheck(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func statefulsetStatusCheck(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin //ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now() ChaosStartTimeStamp := time.Now()
isFailed := false
err = retry. err = retry.
Times(uint(experimentsDetails.ChaosDuration / experimentsDetails.Delay)). Times(uint(experimentsDetails.ChaosDuration / experimentsDetails.Delay)).
@ -286,30 +278,25 @@ func statefulsetStatusCheck(experimentsDetails *experimentTypes.ExperimentDetail
for _, app := range appsUnderTest { for _, app := range appsUnderTest {
statefulset, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{}) statefulset, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil { if err != nil {
return errors.Errorf("fail to find the statefulset with name %v, err: %v", app.AppName, err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
} }
if int(statefulset.Status.ReadyReplicas) != experimentsDetails.Replicas { if int(statefulset.Status.ReadyReplicas) != experimentsDetails.Replicas {
isFailed = true return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to scale statefulset, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, statefulset.Status.ReadyReplicas)}
return errors.Errorf("application %s is not scaled yet, the desired replica count is: %v and ready replica count is: %v", app.AppName, experimentsDetails.Replicas, statefulset.Status.ReadyReplicas)
} }
} }
isFailed = false
return nil return nil
}) })
if isFailed {
if err = autoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
return errors.Errorf("fail to perform the autoscaler recovery of the application, err: %v", err)
}
return errors.Errorf("fail to scale the application to the desired replica count in the given chaos duration")
}
if err != nil { if err != nil {
return err if scaleErr := autoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest, chaosDetails); scaleErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(scaleErr).Error())}
}
return stacktrace.Propagate(err, "failed to scale replicas")
} }
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -323,7 +310,7 @@ func statefulsetStatusCheck(experimentsDetails *experimentTypes.ExperimentDetail
return nil return nil
} }
//autoscalerRecoveryInDeployment rollback the replicas to initial values in deployment // autoscalerRecoveryInDeployment rollback the replicas to initial values in deployment
func autoscalerRecoveryInDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, chaosDetails *types.ChaosDetails) error { func autoscalerRecoveryInDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, chaosDetails *types.ChaosDetails) error {
// Scale back to initial number of replicas // Scale back to initial number of replicas
@ -333,20 +320,20 @@ func autoscalerRecoveryInDeployment(experimentsDetails *experimentTypes.Experime
for _, app := range appsUnderTest { for _, app := range appsUnderTest {
appUnderTest, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{}) appUnderTest, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil { if err != nil {
return errors.Errorf("fail to find the latest version of Application Deployment with name %v, err: %v", app.AppName, err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
} }
appUnderTest.Spec.Replicas = int32Ptr(int32(app.ReplicaCount)) // modify replica count appUnderTest.Spec.Replicas = int32Ptr(int32(app.ReplicaCount)) // modify replica count
_, err = appsv1DeploymentClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{}) _, err = appsv1DeploymentClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to revert scaling in deployment :%s", err.Error())}
} }
common.SetTargets(app.AppName, "reverted", "deployment", chaosDetails) common.SetTargets(app.AppName, "reverted", "deployment", chaosDetails)
} }
return nil return nil
}) })
if retryErr != nil { if retryErr != nil {
return errors.Errorf("fail to rollback the deployment, err: %v", retryErr) return retryErr
} }
log.Info("[Info]: Application started rolling back to original replica count") log.Info("[Info]: Application started rolling back to original replica count")
@ -357,11 +344,11 @@ func autoscalerRecoveryInDeployment(experimentsDetails *experimentTypes.Experime
for _, app := range appsUnderTest { for _, app := range appsUnderTest {
applicationDeploy, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{}) applicationDeploy, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil { if err != nil {
return errors.Errorf("fail to find the deployment with name %v, err: %v", app.AppName, err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
} }
if int(applicationDeploy.Status.ReadyReplicas) != app.ReplicaCount { if int(applicationDeploy.Status.ReadyReplicas) != app.ReplicaCount {
log.Infof("[Info]: Application ready replica count is: %v", applicationDeploy.Status.ReadyReplicas) log.Infof("[Info]: Application ready replica count is: %v", applicationDeploy.Status.ReadyReplicas)
return errors.Errorf("fail to rollback to original replica count, err: %v", err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to rollback deployment scaling, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, applicationDeploy.Status.ReadyReplicas)}
} }
} }
log.Info("[RollBack]: Application rollback to the initial number of replicas") log.Info("[RollBack]: Application rollback to the initial number of replicas")
@ -369,7 +356,7 @@ func autoscalerRecoveryInDeployment(experimentsDetails *experimentTypes.Experime
}) })
} }
//autoscalerRecoveryInStatefulset rollback the replicas to initial values in deployment // autoscalerRecoveryInStatefulset rollback the replicas to initial values in deployment
func autoscalerRecoveryInStatefulset(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, chaosDetails *types.ChaosDetails) error { func autoscalerRecoveryInStatefulset(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, chaosDetails *types.ChaosDetails) error {
// Scale back to initial number of replicas // Scale back to initial number of replicas
@ -379,20 +366,20 @@ func autoscalerRecoveryInStatefulset(experimentsDetails *experimentTypes.Experim
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver // RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
appUnderTest, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{}) appUnderTest, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil { if err != nil {
return errors.Errorf("failed to find the latest version of Statefulset with name %v, err: %v", app.AppName, err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
} }
appUnderTest.Spec.Replicas = int32Ptr(int32(app.ReplicaCount)) // modify replica count appUnderTest.Spec.Replicas = int32Ptr(int32(app.ReplicaCount)) // modify replica count
_, err = appsv1StatefulsetClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{}) _, err = appsv1StatefulsetClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to revert scaling in statefulset :%s", err.Error())}
} }
common.SetTargets(app.AppName, "reverted", "statefulset", chaosDetails) common.SetTargets(app.AppName, "reverted", "statefulset", chaosDetails)
} }
return nil return nil
}) })
if retryErr != nil { if retryErr != nil {
return errors.Errorf("fail to rollback the statefulset, err: %v", retryErr) return retryErr
} }
log.Info("[Info]: Application pod started rolling back") log.Info("[Info]: Application pod started rolling back")
@ -403,11 +390,11 @@ func autoscalerRecoveryInStatefulset(experimentsDetails *experimentTypes.Experim
for _, app := range appsUnderTest { for _, app := range appsUnderTest {
applicationDeploy, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{}) applicationDeploy, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil { if err != nil {
return errors.Errorf("fail to get the statefulset with name %v, err: %v", app.AppName, err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
} }
if int(applicationDeploy.Status.ReadyReplicas) != app.ReplicaCount { if int(applicationDeploy.Status.ReadyReplicas) != app.ReplicaCount {
log.Infof("Application ready replica count is: %v", applicationDeploy.Status.ReadyReplicas) log.Infof("Application ready replica count is: %v", applicationDeploy.Status.ReadyReplicas)
return errors.Errorf("fail to roll back to original replica count, err: %v", err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to rollback statefulset scaling, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, applicationDeploy.Status.ReadyReplicas)}
} }
} }
log.Info("[RollBack]: Application roll back to initial number of replicas") log.Info("[RollBack]: Application roll back to initial number of replicas")
@ -417,7 +404,7 @@ func autoscalerRecoveryInStatefulset(experimentsDetails *experimentTypes.Experim
func int32Ptr(i int32) *int32 { return &i } func int32Ptr(i int32) *int32 { return &i }
//abortPodAutoScalerChaos go routine will continuously watch for the abort signal for the entire chaos duration and generate the required events and result // abortPodAutoScalerChaos go routine will continuously watch for the abort signal for the entire chaos duration and generate the required events and result
func abortPodAutoScalerChaos(appsUnderTest []experimentTypes.ApplicationUnderTest, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) { func abortPodAutoScalerChaos(appsUnderTest []experimentTypes.ApplicationUnderTest, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) {
// signChan channel is used to transmit signal notifications. // signChan channel is used to transmit signal notifications.

View File

@ -1,13 +1,20 @@
package lib package lib
import ( import (
"context"
"fmt"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
"syscall" "syscall"
"time" "time"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-cpu-hog-exec/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-cpu-hog-exec/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
@ -16,36 +23,61 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec" litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1" corev1 "k8s.io/api/core/v1"
) )
var inject chan os.Signal var inject chan os.Signal
// PrepareCPUExecStress contains the chaos preparation and injection steps
func PrepareCPUExecStress(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodCPUHogExecFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the CPU stress experiment
if err := experimentCPU(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not stress cpu")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// stressCPU Uses the REST API to exec into the target container of the target pod // stressCPU Uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the CPU utilisation until it reaches the maximum available or allowed number. // The function will be constantly increasing the CPU utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last // Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
func stressCPU(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets, stressErr chan error) { func stressCPU(experimentsDetails *experimentTypes.ExperimentDetails, podName, ns string, clients clients.ClientSets, stressErr chan error) {
// It will contains all the pod & container details required for exec command // It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{} execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", experimentsDetails.ChaosInjectCmd} command := []string{"/bin/sh", "-c", experimentsDetails.ChaosInjectCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, experimentsDetails.AppNS) litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, ns)
_, err := litmusexec.Exec(&execCommandDetails, clients, command) _, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err stressErr <- err
} }
//experimentCPU function orchestrates the experiment by calling the StressCPU function for every core, of every container, of every pod that is targeted // experimentCPU function orchestrates the experiment by calling the StressCPU function for every core, of every container, of every pod that is targeted
func experimentCPU(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func experimentCPU(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution // Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage // if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" { if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
} }
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails) targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get target pods")
} }
podNames := []string{} podNames := []string{}
@ -54,30 +86,31 @@ func experimentCPU(experimentsDetails *experimentTypes.ExperimentDetails, client
} }
log.Infof("Target pods list for chaos, %v", podNames) log.Infof("Target pods list for chaos, %v", podNames)
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "") experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
return nil return nil
} }
// injectChaosInSerialMode stressed the cpu of all target application serially (one by one) // injectChaosInSerialMode stressed the cpu of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodCPUHogExecFaultInSerialMode")
defer span.End()
var err error
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -109,10 +142,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get the target container name of the application pod //Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided { if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{ log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
@ -122,7 +152,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}) })
for i := 0; i < experimentsDetails.CPUcores; i++ { for i := 0; i < experimentsDetails.CPUcores; i++ {
go stressCPU(experimentsDetails, pod.Name, clients, stressErr) go stressCPU(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
} }
common.SetTargets(pod.Name, "injected", "pod", chaosDetails) common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
@ -142,18 +172,20 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Warn("Chaos process OOM killed") log.Warn("Chaos process OOM killed")
return nil return nil
} }
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("podName: %s, namespace: %s, container: %s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: fmt.Sprintf("failed to stress cpu of target pod: %s", err.Error())}
} }
case <-signChan: case <-signChan:
log.Info("[Chaos]: Revert Started") log.Info("[Chaos]: Revert Started")
err := killStressCPUSerial(experimentsDetails, pod.Name, clients, chaosDetails) if err := killStressCPUSerial(experimentsDetails, pod.Name, pod.Namespace, clients, chaosDetails); err != nil {
if err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err) log.Errorf("Error in Kill stress after abortion, err: %v", err)
} }
// updating the chaosresult after stopped // updating the chaosresult after stopped
failStep := "Chaos injection stopped!" err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: "experiment is aborted"}
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep) failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT") types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed") log.Info("[Chaos]: Revert Completed")
os.Exit(1) os.Exit(1)
case <-endTime: case <-endTime:
@ -162,8 +194,8 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
break loop break loop
} }
} }
if err := killStressCPUSerial(experimentsDetails, pod.Name, clients, chaosDetails); err != nil { if err := killStressCPUSerial(experimentsDetails, pod.Name, pod.Namespace, clients, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not revert cpu stress")
} }
} }
} }
@ -171,13 +203,16 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
// injectChaosInParallelMode stressed the cpu of all target application in parallel mode (all at once) // injectChaosInParallelMode stressed the cpu of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodCPUHogExecFaultInParallelMode")
defer span.End()
// creating err channel to receive the error from the go routine // creating err channel to receive the error from the go routine
stressErr := make(chan error) stressErr := make(chan error)
var err error
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -205,10 +240,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
} }
//Get the target container name of the application pod //Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided { if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{ log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
@ -217,7 +249,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"CPU CORE": experimentsDetails.CPUcores, "CPU CORE": experimentsDetails.CPUcores,
}) })
for i := 0; i < experimentsDetails.CPUcores; i++ { for i := 0; i < experimentsDetails.CPUcores; i++ {
go stressCPU(experimentsDetails, pod.Name, clients, stressErr) go stressCPU(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
} }
common.SetTargets(pod.Name, "injected", "pod", chaosDetails) common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
} }
@ -238,7 +270,7 @@ loop:
log.Warn("Chaos process OOM killed") log.Warn("Chaos process OOM killed")
return nil return nil
} }
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to stress cpu of target pod: %s", err.Error())}
} }
case <-signChan: case <-signChan:
log.Info("[Chaos]: Revert Started") log.Info("[Chaos]: Revert Started")
@ -246,9 +278,12 @@ loop:
log.Errorf("Error in Kill stress after abortion, err: %v", err) log.Errorf("Error in Kill stress after abortion, err: %v", err)
} }
// updating the chaosresult after stopped // updating the chaosresult after stopped
failStep := "Chaos injection stopped!" err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep) failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT") types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed") log.Info("[Chaos]: Revert Completed")
os.Exit(1) os.Exit(1)
case <-endTime: case <-endTime:
@ -260,43 +295,19 @@ loop:
return killStressCPUParallel(experimentsDetails, targetPodList, clients, chaosDetails) return killStressCPUParallel(experimentsDetails, targetPodList, clients, chaosDetails)
} }
//PrepareCPUExecStress contains the chaos prepration and injection steps
func PrepareCPUExecStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the CPU stress experiment
if err := experimentCPU(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// killStressCPUSerial function to kill a stress process running inside target container // killStressCPUSerial function to kill a stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment //
func killStressCPUSerial(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error { // Triggered by either timeout of chaos duration or termination of the experiment
// It will contains all the pod & container details required for exec command func killStressCPUSerial(experimentsDetails *experimentTypes.ExperimentDetails, podName, ns string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{} execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", experimentsDetails.ChaosKillCmd} command := []string{"/bin/sh", "-c", experimentsDetails.ChaosKillCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, experimentsDetails.AppNS) litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, ns)
_, err := litmusexec.Exec(&execCommandDetails, clients, command) out, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil { if err != nil {
return errors.Errorf("Unable to kill the stress process in %v pod, err: %v", podName, err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, ns), Reason: fmt.Sprintf("failed to revert chaos: %s", out)}
} }
common.SetTargets(podName, "reverted", "pod", chaosDetails) common.SetTargets(podName, "reverted", "pod", chaosDetails)
return nil return nil
@ -305,12 +316,14 @@ func killStressCPUSerial(experimentsDetails *experimentTypes.ExperimentDetails,
// killStressCPUParallel function to kill all the stress process running inside target container // killStressCPUParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment // Triggered by either timeout of chaos duration or termination of the experiment
func killStressCPUParallel(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error { func killStressCPUParallel(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
var errList []string
for _, pod := range targetPodList.Items { for _, pod := range targetPodList.Items {
if err := killStressCPUSerial(experimentsDetails, pod.Name, pod.Namespace, clients, chaosDetails); err != nil {
if err := killStressCPUSerial(experimentsDetails, pod.Name, clients, chaosDetails); err != nil { errList = append(errList, err.Error())
return err
} }
} }
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil return nil
} }

View File

@ -2,27 +2,32 @@ package lib
import ( import (
"context" "context"
"fmt"
"strconv" "strconv"
"strings" "strings"
"time" "time"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-delete/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-delete/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status" "github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/annotation"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/litmuschaos/litmus-go/pkg/workloads"
"github.com/palantir/stacktrace"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1" "go.opentelemetry.io/otel"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
) )
//PreparePodDelete contains the prepration steps before chaos injection // PreparePodDelete contains the preparation steps before chaos injection
func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PreparePodDelete(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodDeleteFault")
defer span.End()
//Waiting for the ramp time before chaos injection //Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
@ -30,7 +35,7 @@ func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, cli
common.WaitForDuration(experimentsDetails.RampTime) common.WaitForDuration(experimentsDetails.RampTime)
} }
//setup the tunables if provided in range //set up the tunables if provided in range
SetChaosTunables(experimentsDetails) SetChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{ log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
@ -40,15 +45,15 @@ func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, cli
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err := injectChaosInSerialMode(experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil { if err := injectChaosInSerialMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err := injectChaosInParallelMode(experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil { if err := injectChaosInParallelMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -60,14 +65,13 @@ func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, cli
} }
// injectChaosInSerialMode delete the target application pods serial mode(one by one) // injectChaosInSerialMode delete the target application pods serial mode(one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDeleteFaultInSerialMode")
defer span.End()
targetPodList := apiv1.PodList{}
var err error
var podsAffectedPerc int
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -80,49 +84,26 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
for duration < experimentsDetails.ChaosDuration { for duration < experimentsDetails.ChaosDuration {
// Get the target pod details for the chaos execution // Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage // if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" { if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
} }
podsAffectedPerc, _ = strconv.Atoi(experimentsDetails.PodsAffectedPerc)
if experimentsDetails.NodeLabel == "" { targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails) if err != nil {
if err != nil { return stacktrace.Propagate(err, "could not get target pods")
return err
}
} else {
if experimentsDetails.TargetPods == "" {
targetPodList, err = common.GetPodListFromSpecifiedNodes(experimentsDetails.TargetPods, podsAffectedPerc, experimentsDetails.NodeLabel, clients, chaosDetails)
if err != nil {
return err
}
} else {
log.Infof("TARGET_PODS env is provided, overriding the NODE_LABEL input")
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
}
} }
// deriving the parent name of the target resources // deriving the parent name of the target resources
if chaosDetails.AppDetail.Kind != "" {
for _, pod := range targetPodList.Items {
parentName, err := annotation.GetParentName(clients, pod, chaosDetails)
if err != nil {
return err
}
common.SetParentName(parentName, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target, "targeted", chaosDetails.AppDetail.Kind, chaosDetails)
}
}
podNames := []string{}
for _, pod := range targetPodList.Items { for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name) kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return stacktrace.Propagate(err, "could not get pod owner name and kind")
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
} }
log.Infof("Target pods list: %v", podNames)
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod" msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
@ -137,18 +118,18 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"PodName": pod.Name}) "PodName": pod.Name})
if experimentsDetails.Force { if experimentsDetails.Force {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod}) err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else { } else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(context.Background(), pod.Name, v1.DeleteOptions{}) err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
} }
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
} }
switch chaosDetails.Randomness { switch chaosDetails.Randomness {
case true: case true:
if err := common.RandomInterval(experimentsDetails.ChaosInterval); err != nil { if err := common.RandomInterval(experimentsDetails.ChaosInterval); err != nil {
return err return stacktrace.Propagate(err, "could not get random chaos interval")
} }
default: default:
//Waiting for the chaos interval after chaos injection //Waiting for the chaos interval after chaos injection
@ -161,8 +142,15 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Verify the status of pod after the chaos injection //Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod") log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil { for _, parent := range chaosDetails.ParentsResources {
return err target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
} }
duration = int(time.Since(ChaosStartTimeStamp).Seconds()) duration = int(time.Since(ChaosStartTimeStamp).Seconds())
@ -176,14 +164,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
// injectChaosInParallelMode delete the target application pods in parallel mode (all at once) // injectChaosInParallelMode delete the target application pods in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDeleteFaultInParallelMode")
defer span.End()
targetPodList := apiv1.PodList{}
var err error
var podsAffectedPerc int
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -196,49 +183,25 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for duration < experimentsDetails.ChaosDuration { for duration < experimentsDetails.ChaosDuration {
// Get the target pod details for the chaos execution // Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage // if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" { if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or TARGET_PODS"}
} }
podsAffectedPerc, _ = strconv.Atoi(experimentsDetails.PodsAffectedPerc) targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if experimentsDetails.NodeLabel == "" { if err != nil {
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails) return stacktrace.Propagate(err, "could not get target pods")
if err != nil {
return err
}
} else {
if experimentsDetails.TargetPods == "" {
targetPodList, err = common.GetPodListFromSpecifiedNodes(experimentsDetails.TargetPods, podsAffectedPerc, experimentsDetails.NodeLabel, clients, chaosDetails)
if err != nil {
return err
}
} else {
log.Infof("TARGET_PODS env is provided, overriding the NODE_LABEL input")
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
}
} }
// deriving the parent name of the target resources // deriving the parent name of the target resources
if chaosDetails.AppDetail.Kind != "" {
for _, pod := range targetPodList.Items {
parentName, err := annotation.GetParentName(clients, pod, chaosDetails)
if err != nil {
return err
}
common.SetParentName(parentName, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target, "targeted", chaosDetails.AppDetail.Kind, chaosDetails)
}
}
podNames := []string{}
for _, pod := range targetPodList.Items { for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name) kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return stacktrace.Propagate(err, "could not get pod owner name and kind")
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
} }
log.Infof("Target pods list: %v", podNames)
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod" msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
@ -253,19 +216,19 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"PodName": pod.Name}) "PodName": pod.Name})
if experimentsDetails.Force { if experimentsDetails.Force {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod}) err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else { } else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(context.Background(), pod.Name, v1.DeleteOptions{}) err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
} }
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
} }
} }
switch chaosDetails.Randomness { switch chaosDetails.Randomness {
case true: case true:
if err := common.RandomInterval(experimentsDetails.ChaosInterval); err != nil { if err := common.RandomInterval(experimentsDetails.ChaosInterval); err != nil {
return err return stacktrace.Propagate(err, "could not get random chaos interval")
} }
default: default:
//Waiting for the chaos interval after chaos injection //Waiting for the chaos interval after chaos injection
@ -278,8 +241,15 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Verify the status of pod after the chaos injection //Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod") log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil { for _, parent := range chaosDetails.ParentsResources {
return err target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
} }
duration = int(time.Since(ChaosStartTimeStamp).Seconds()) duration = int(time.Since(ChaosStartTimeStamp).Seconds())
} }
@ -289,8 +259,8 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
return nil return nil
} }
//SetChaosTunables will setup a random value within a given range of values // SetChaosTunables will setup a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value. // If the value is not provided in range it'll setup the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) { func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc) experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence) experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)

View File

@ -2,8 +2,12 @@ package helper
import ( import (
"bytes" "bytes"
"context"
"fmt" "fmt"
"github.com/kyokomi/emoji" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"os" "os"
"os/exec" "os/exec"
"os/signal" "os/signal"
@ -24,6 +28,7 @@ import (
var ( var (
abort, injectAbort chan os.Signal abort, injectAbort chan os.Signal
err error
) )
const ( const (
@ -32,7 +37,9 @@ const (
) )
// Helper injects the dns chaos // Helper injects the dns chaos
func Helper(clients clients.ClientSets) { func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodDNSFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{} experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{} eventsDetails := types.EventDetails{}
@ -63,23 +70,70 @@ func Helper(clients clients.ClientSets) {
result.SetResultUID(&resultDetails, clients, &chaosDetails) result.SetResultUID(&resultDetails, clients, &chaosDetails)
if err := preparePodDNSChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil { if err := preparePodDNSChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err) log.Fatalf("helper pod failed, err: %v", err)
} }
} }
//preparePodDNSChaos contains the preparation steps before chaos injection // preparePodDNSChaos contains the preparation steps before chaos injection
func preparePodDNSChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error { func preparePodDNSChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
containerID, err := common.GetContainerID(experimentsDetails.AppNS, experimentsDetails.TargetPods, experimentsDetails.TargetContainer, clients) targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not parse targets")
} }
// extract out the pid of the target container var targets []targetDetails
pid, err := common.GetPID(experimentsDetails.ContainerRuntime, containerID, experimentsDetails.SocketPath)
if err != nil { for _, t := range targetList.Target {
return err td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
td.ContainerId, err = common.GetContainerID(td.Namespace, td.Name, td.TargetContainer, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the pid of the target container
td.Pid, err = common.GetPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
}
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos if an abort signal is received
go abortWatcher(targets, resultDetails.Name, chaosDetails.ChaosNamespace)
select {
case <-injectAbort:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
done := make(chan error, 1)
for index, t := range targets {
targets[index].Cmd, err = injectChaos(experimentsDetails, t)
if err != nil {
return stacktrace.Propagate(err, "could not inject chaos")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := terminateProcess(t); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
} }
// record the event inside chaosengine // record the event inside chaosengine
@ -89,91 +143,136 @@ func preparePodDNSChaos(experimentsDetails *experimentTypes.ExperimentDetails, c
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine") events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
} }
// prepare dns interceptor log.Info("[Wait]: Waiting for chaos completion")
commandTemplate := fmt.Sprintf("sudo TARGET_PID=%d CHAOS_TYPE=%s SPOOF_MAP='%s' TARGET_HOSTNAMES='%s' CHAOS_DURATION=%d MATCH_SCHEME=%s nsutil -p -n -t %d -- dns_interceptor", pid, experimentsDetails.ChaosType, experimentsDetails.SpoofMap, experimentsDetails.TargetHostNames, experimentsDetails.ChaosDuration, experimentsDetails.MatchScheme, pid) // channel to check the completion of the stress process
cmd := exec.Command("/bin/bash", "-c", commandTemplate)
log.Info(cmd.String())
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
// injecting dns chaos inside target container
go func() { go func() {
select { var errList []string
case <-injectAbort: for _, t := range targets {
log.Info("[Chaos]: Abort received, skipping chaos injection") if err := t.Cmd.Wait(); err != nil {
default: errList = append(errList, err.Error())
err = cmd.Run()
if err != nil {
log.Fatalf("dns interceptor failed : %v", err)
} }
} }
if len(errList) != 0 {
log.Errorf("err: %v", strings.Join(errList, ", "))
done <- fmt.Errorf("err: %v", strings.Join(errList, ", "))
}
done <- nil
}() }()
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", experimentsDetails.TargetPods); err != nil { // check the timeout for the command
if revertErr := terminateProcess(cmd); revertErr != nil { // Note: timeout will occur when process didn't complete even after 10s of chaos duration
return fmt.Errorf("failed to revert and annotate the result, err: %v", fmt.Sprintf("%s, %s", err.Error(), revertErr.Error())) timeout := time.After((time.Duration(experimentsDetails.ChaosDuration) + 30) * time.Second)
}
return err
}
timeChan := time.Tick(time.Duration(experimentsDetails.ChaosDuration) * time.Second)
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
// either wait for abort signal or chaos duration
select { select {
case <-abort: case <-timeout:
log.Info("[Chaos]: Killing process started because of terminated signal received") // the stress process gets timeout before completion
case <-timeChan: log.Infof("[Chaos] The stress process is not yet completed after the chaos duration of %vs", experimentsDetails.ChaosDuration+30)
log.Info("[Chaos]: Stopping the experiment, chaos duration over") log.Info("[Timeout]: Killing the stress process")
var errList []string
for _, t := range targets {
if err = terminateProcess(t); err != nil {
errList = append(errList, err.Error())
continue
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
case doneErr := <-done:
select {
case <-injectAbort:
// wait for the completion of abort handler
time.Sleep(10 * time.Second)
default:
log.Info("[Info]: Reverting Chaos")
var errList []string
for _, t := range targets {
if err := terminateProcess(t); err != nil {
errList = append(errList, err.Error())
continue
}
if err := result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return doneErr
}
} }
log.Info("Chaos Revert Started") return nil
// retry thrice for the chaos revert }
func injectChaos(experimentsDetails *experimentTypes.ExperimentDetails, t targetDetails) (*exec.Cmd, error) {
// prepare dns interceptor
var out bytes.Buffer
commandTemplate := fmt.Sprintf("sudo TARGET_PID=%d CHAOS_TYPE=%s SPOOF_MAP='%s' TARGET_HOSTNAMES='%s' CHAOS_DURATION=%d MATCH_SCHEME=%s nsutil -p -n -t %d -- dns_interceptor", t.Pid, experimentsDetails.ChaosType, experimentsDetails.SpoofMap, experimentsDetails.TargetHostNames, experimentsDetails.ChaosDuration, experimentsDetails.MatchScheme, t.Pid)
cmd := exec.Command("/bin/bash", "-c", commandTemplate)
log.Info(cmd.String())
cmd.Stdout = &out
cmd.Stderr = &out
if err = cmd.Start(); err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: experimentsDetails.ChaosPodName, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("faild to inject chaos: %s", out.String())}
}
return cmd, nil
}
func terminateProcess(t targetDetails) error {
// kill command
killTemplate := fmt.Sprintf("sudo kill %d", t.Cmd.Process.Pid)
kill := exec.Command("/bin/bash", "-c", killTemplate)
var out bytes.Buffer
kill.Stderr = &out
kill.Stdout = &out
if err = kill.Run(); err != nil {
if strings.Contains(strings.ToLower(out.String()), ProcessAlreadyKilled) {
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("failed to revert chaos %s", out.String())}
} else {
log.Errorf("dns interceptor process stopped")
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
}
return nil
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targets []targetDetails, resultName, chaosNS string) {
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("[Abort]: Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3 retry := 3
for retry > 0 { for retry > 0 {
if cmd.Process == nil { for _, t := range targets {
log.Infof("cannot kill dns interceptor, process not started. Retrying in 1sec...") if err = terminateProcess(t); err != nil {
} else { log.Errorf("unable to revert for %v pod, err :%v", t.Name, err)
log.Infof("killing dns interceptor with pid %v", cmd.Process.Pid) continue
if err := terminateProcess(cmd); err != nil { }
return err if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult for %v pod, err :%v", t.Name, err)
} }
} }
retry-- retry--
time.Sleep(1 * time.Second) time.Sleep(1 * time.Second)
} }
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods); err != nil { log.Info("[Abort]: Chaos Revert Completed")
return err os.Exit(1)
}
log.Info("Chaos Revert Completed")
return nil
} }
func terminateProcess(cmd *exec.Cmd) error { // getENV fetches all the env variables from the runner pod
// kill command
killTemplate := fmt.Sprintf("sudo kill %d", cmd.Process.Pid)
kill := exec.Command("/bin/bash", "-c", killTemplate)
var stderr bytes.Buffer
kill.Stderr = &stderr
if err := kill.Run(); err != nil {
if strings.Contains(strings.ToLower(stderr.String()), ProcessAlreadyKilled) {
return nil
}
log.Errorf("unable to kill dns interceptor process %v, err :%v", emoji.Sprint(":cry:"), err)
} else {
log.Errorf("dns interceptor process stopped")
}
return nil
}
//getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) { func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "") experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "") experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "60")) experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "60"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus") experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "") experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
@ -186,3 +285,14 @@ func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosType = types.Getenv("CHAOS_TYPE", "error") experimentDetails.ChaosType = types.Getenv("CHAOS_TYPE", "error")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "") experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
} }
type targetDetails struct {
Name string
Namespace string
TargetContainer string
ContainerId string
Pid int
CommandPid int
Cmd *exec.Cmd
Source string
}

View File

@ -2,33 +2,40 @@ package lib
import ( import (
"context" "context"
"fmt"
"os"
"strconv" "strconv"
"strings" "strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-dns-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-dns-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1" apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
) )
//PrepareAndInjectChaos contains the preparation & injection steps // PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodDNSFault")
defer span.End()
// Get the target pod details for the chaos execution // Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage // if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" { if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
} }
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails) targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get target pods")
} }
podNames := []string{} podNames := []string{}
@ -47,41 +54,41 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
if experimentsDetails.ChaosServiceAccount == "" { if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients) experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil { if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err) return stacktrace.Propagate(err, "could not get experiment service account")
} }
} }
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil { if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err return stacktrace.Propagate(err, "could not set helper data")
} }
} }
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "") experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil { if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil { if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
return nil return nil
} }
// injectChaosInSerialMode inject the DNS Chaos in all target application serially (one by one) // injectChaosInSerialMode inject the DNS Chaos in all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDNSFaultInSerialMode")
defer span.End()
labelSuffix := common.GetRunID()
var err error
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -91,10 +98,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get the target container name of the application pod //Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided { if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{ log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
@ -102,33 +106,15 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"NodeName": pod.Spec.NodeName, "NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer, "ContainerName": experimentsDetails.TargetContainer,
}) })
runID := common.GetRunID() runID := stringutils.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil { if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err) return stacktrace.Propagate(err, "could not create helper pod")
} }
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
log.Info("[Status]: Checking the status of the helper pods") return err
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for pod-dns chaos
log.Info("[Cleanup]: Deleting the the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("Unable to delete the helper pods, err: %v", err)
} }
} }
@ -136,78 +122,53 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
// injectChaosInParallelMode inject the DNS Chaos in all target application in parallel mode (all at once) // injectChaosInParallelMode inject the DNS Chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDNSFaultInParallelMode")
defer span.End()
labelSuffix := common.GetRunID()
var err error
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
// creating the helper pod to perform DNS Chaos runID := stringutils.GetRunID()
for _, pod := range targetPodList.Items { targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
//Get the target container name of the application pod for node, tar := range targets {
if !experimentsDetails.IsTargetContainerProvided { var targetsPerNode []string
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) for _, k := range tar.Target {
if err != nil { targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{ if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
"PodName": pod.Name, return stacktrace.Propagate(err, "could not create helper pod")
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
} }
} }
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
log.Info("[Status]: Checking the status of the helper pods") return err
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for pod-dns chaos
log.Info("[Cleanup]: Deleting all the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("Unable to delete the helper pods, err: %v", err)
} }
return nil return nil
} }
// createHelperPod derive the attributes for helper pod and create the helper pod // createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, podName, nodeName, runID, labelSuffix string) error { func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreatePodDNSFaultHelperPod")
defer span.End()
privilegedEnable := true privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds) terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{ helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{ ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID, GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace, Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName), Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations, Annotations: chaosDetails.Annotations,
}, },
Spec: apiv1.PodSpec{ Spec: apiv1.PodSpec{
HostPID: true, HostPID: true,
@ -240,7 +201,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name dns-chaos", "./helpers -name dns-chaos",
}, },
Resources: chaosDetails.Resources, Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, podName), Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{ VolumeMounts: []apiv1.VolumeMount{
{ {
Name: "cri-socket", Name: "cri-socket",
@ -255,18 +216,23 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
}, },
} }
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{}) if len(chaosDetails.SideCar) != 0 {
return err helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
} }
// getPodEnv derive all the env required for the helper pod // getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName string) []apiv1.EnvVar { func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS). envDetails.SetEnv("TARGETS", targets).
SetEnv("APP_POD", podName).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)). SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace). SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName). SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
@ -279,6 +245,8 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName st
SetEnv("MATCH_SCHEME", experimentsDetails.MatchScheme). SetEnv("MATCH_SCHEME", experimentsDetails.MatchScheme).
SetEnv("CHAOS_TYPE", experimentsDetails.ChaosType). SetEnv("CHAOS_TYPE", experimentsDetails.ChaosType).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID). SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name") SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV return envDetails.ENV

View File

@ -1,6 +1,7 @@
package lib package lib
import ( import (
"context"
"fmt" "fmt"
"os" "os"
"os/signal" "os/signal"
@ -8,7 +9,13 @@ import (
"syscall" "syscall"
"time" "time"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-fio-stress/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-fio-stress/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
@ -16,15 +23,36 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec" litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1" corev1 "k8s.io/api/core/v1"
) )
// PrepareChaos contains the chaos preparation and injection steps
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodFIOStressFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Fio stress experiment
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not inject chaos")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// stressStorage uses the REST API to exec into the target container of the target pod // stressStorage uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the storage utilisation until it reaches the maximum available or allowed number. // The function will be constantly increasing the storage utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last // Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
func stressStorage(experimentDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets, stressErr chan error) { func stressStorage(experimentDetails *experimentTypes.ExperimentDetails, podName, ns string, clients clients.ClientSets, stressErr chan error) {
log.Infof("The storage consumption is: %vM", experimentDetails.Size) log.Infof("The storage consumption is: %vM", experimentDetails.Size)
@ -37,23 +65,24 @@ func stressStorage(experimentDetails *experimentTypes.ExperimentDetails, podName
log.Infof("Running the command:\n%v", fioCmd) log.Infof("Running the command:\n%v", fioCmd)
command := []string{"/bin/sh", "-c", fioCmd} command := []string{"/bin/sh", "-c", fioCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentDetails.TargetContainer, experimentDetails.AppNS) litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentDetails.TargetContainer, ns)
_, err := litmusexec.Exec(&execCommandDetails, clients, command) _, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err stressErr <- err
} }
//experimentExecution function orchestrates the experiment by calling the StressStorage function, of every container, of every pod that is targeted // experimentExecution function orchestrates the experiment by calling the StressStorage function, of every container, of every pod that is targeted
func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution // Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage // if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" { if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return errors.Errorf("please provide either of the appLabel or TARGET_PODS") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
} }
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails) targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get target pods")
} }
podNames := []string{} podNames := []string{}
@ -62,31 +91,33 @@ func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails,
} }
log.Infof("Target pods list for chaos, %v", podNames) log.Infof("Target pods list for chaos, %v", podNames)
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "") experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
return nil return nil
} }
// injectChaosInSerialMode stressed the storage of all target application in serial mode (one by one) // injectChaosInSerialMode stressed the storage of all target application in serial mode (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodFIOStressFaultInSerialMode")
defer span.End()
// creating err channel to receive the error from the go routine // creating err channel to receive the error from the go routine
stressErr := make(chan error) stressErr := make(chan error)
var err error
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -103,10 +134,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
//Get the target container name of the application pod //Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided { if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{ log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
@ -114,7 +142,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"Target Pod": pod.Name, "Target Pod": pod.Name,
"Space Consumption(MB)": experimentsDetails.Size, "Space Consumption(MB)": experimentsDetails.Size,
}) })
go stressStorage(experimentsDetails, pod.Name, clients, stressErr) go stressStorage(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration) log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
@ -130,19 +158,25 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
case err := <-stressErr: case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail // skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass // it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if stor to be stressed exceed than the resource limit for the target container // oom kill occurs if resource to be stressed exceed than the resource limit for the target container
if err != nil { if err != nil {
if strings.Contains(err.Error(), "137") { if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed") log.Warn("Chaos process OOM killed")
return nil return nil
} }
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("podName: %s, namespace: %s, container: %s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: fmt.Sprintf("failed to stress cpu of target pod: %s", err.Error())}
} }
case <-signChan: case <-signChan:
log.Info("[Chaos]: Revert Started") log.Info("[Chaos]: Revert Started")
if err := killStressSerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil { if err := killStressSerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err) log.Errorf("Error in Kill stress after abortion, err: %v", err)
} }
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed") log.Info("[Chaos]: Revert Completed")
os.Exit(1) os.Exit(1)
case <-endTime: case <-endTime:
@ -151,21 +185,23 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
break loop break loop
} }
} }
if err := killStressSerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil { if err := killStressSerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients); err != nil {
return err return stacktrace.Propagate(err, "could not revert chaos")
} }
} }
return nil return nil
} }
// injectChaosInParallelMode stressed the storage of all target application in parallel mode (all at once) // injectChaosInParallelMode stressed the storage of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodFIOStressFaultInParallelMode")
defer span.End()
// creating err channel to receive the error from the go routine // creating err channel to receive the error from the go routine
stressErr := make(chan error) stressErr := make(chan error)
var err error
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -182,10 +218,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
} }
//Get the target container name of the application pod //Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided { if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{ log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
@ -193,7 +226,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"Target Pod": pod.Name, "Target Pod": pod.Name,
"Storage Consumption(MB)": experimentsDetails.Size, "Storage Consumption(MB)": experimentsDetails.Size,
}) })
go stressStorage(experimentsDetails, pod.Name, clients, stressErr) go stressStorage(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
} }
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration) log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
@ -209,19 +242,25 @@ loop:
case err := <-stressErr: case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail // skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass // it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if stor to be stressed exceed than the resource limit for the target container // oom kill occurs if resource to be stressed exceed than the resource limit for the target container
if err != nil { if err != nil {
if strings.Contains(err.Error(), "137") { if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed") log.Warn("Chaos process OOM killed")
return nil return nil
} }
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to injcet chaos: %s", err.Error())}
} }
case <-signChan: case <-signChan:
log.Info("[Chaos]: Revert Started") log.Info("[Chaos]: Revert Started")
if err := killStressParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil { if err := killStressParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err) log.Errorf("Error in Kill stress after abortion, err: %v", err)
} }
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed") log.Info("[Chaos]: Revert Completed")
os.Exit(1) os.Exit(1)
case <-endTime: case <-endTime:
@ -229,58 +268,41 @@ loop:
break loop break loop
} }
} }
if err := killStressParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil { if err := killStressParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients); err != nil {
return err return stacktrace.Propagate(err, "could revert chaos")
} }
return nil return nil
} }
//PrepareChaos contains the chaos prepration and injection steps
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Fio stress experiment
if err := experimentExecution(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// killStressSerial function to kill a stress process running inside target container // killStressSerial function to kill a stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment //
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressSerial(containerName, podName, namespace, KillCmd string, clients clients.ClientSets) error { func killStressSerial(containerName, podName, namespace, KillCmd string, clients clients.ClientSets) error {
// It will contains all the pod & container details required for exec command // It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{} execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", KillCmd} command := []string{"/bin/sh", "-c", KillCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace) litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, err := litmusexec.Exec(&execCommandDetails, clients, command) out, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil { if err != nil {
return errors.Errorf("Unable to kill stress process inside target container, err: %v", err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, namespace), Reason: fmt.Sprintf("failed to revert chaos: %s", out)}
} }
return nil return nil
} }
// killStressParallel function to kill all the stress process running inside target container // killStressParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment // Triggered by either timeout of chaos duration or termination of the experiment
func killStressParallel(containerName string, targetPodList corev1.PodList, namespace, KillCmd string, clients clients.ClientSets) error { func killStressParallel(containerName string, targetPodList corev1.PodList, KillCmd string, clients clients.ClientSets) error {
var errList []string
for _, pod := range targetPodList.Items { for _, pod := range targetPodList.Items {
if err := killStressSerial(containerName, pod.Name, pod.Namespace, KillCmd, clients); err != nil {
if err := killStressSerial(containerName, pod.Name, namespace, KillCmd, clients); err != nil { errList = append(errList, err.Error())
return err
} }
} }
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil return nil
} }

View File

@ -1,6 +1,7 @@
package lib package lib
import ( import (
"context"
"fmt" "fmt"
"os" "os"
"os/signal" "os/signal"
@ -9,7 +10,12 @@ import (
"syscall" "syscall"
"time" "time"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-memory-hog-exec/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-memory-hog-exec/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
@ -18,13 +24,39 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec" litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1" corev1 "k8s.io/api/core/v1"
) )
var inject chan os.Signal var inject chan os.Signal
// PrepareMemoryExecStress contains the chaos preparation and injection steps
func PrepareMemoryExecStress(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodMemoryHogExecFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Memory stress experiment
if err := experimentMemory(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not stress memory")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// stressMemory Uses the REST API to exec into the target container of the target pod // stressMemory Uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the Memory utilisation until it reaches the maximum available or allowed number. // The function will be constantly increasing the Memory utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last // Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
@ -39,22 +71,23 @@ func stressMemory(MemoryConsumption, containerName, podName, namespace string, c
command := []string{"/bin/sh", "-c", ddCmd} command := []string{"/bin/sh", "-c", ddCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace) litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, err := litmusexec.Exec(&execCommandDetails, clients, command) _, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err stressErr <- err
} }
//experimentMemory function orchestrates the experiment by calling the StressMemory function, of every container, of every pod that is targeted // experimentMemory function orchestrates the experiment by calling the StressMemory function, of every container, of every pod that is targeted
func experimentMemory(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func experimentMemory(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution // Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage // if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" { if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
} }
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails) targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get target pods")
} }
podNames := []string{} podNames := []string{}
@ -63,30 +96,31 @@ func experimentMemory(experimentsDetails *experimentTypes.ExperimentDetails, cli
} }
log.Infof("Target pods list for chaos, %v", podNames) log.Infof("Target pods list for chaos, %v", podNames)
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "") experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
return nil return nil
} }
// injectChaosInSerialMode stressed the memory of all target application serially (one by one) // injectChaosInSerialMode stressed the memory of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodMemoryHogExecFaultInSerialMode")
defer span.End()
var err error
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -118,10 +152,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get the target container name of the application pod //Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided { if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{ log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
@ -129,7 +160,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"Target Pod": pod.Name, "Target Pod": pod.Name,
"Memory Consumption(MB)": experimentsDetails.MemoryConsumption, "Memory Consumption(MB)": experimentsDetails.MemoryConsumption,
}) })
go stressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, clients, stressErr) go stressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, pod.Namespace, clients, stressErr)
common.SetTargets(pod.Name, "injected", "pod", chaosDetails) common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
@ -148,17 +179,20 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Warn("Chaos process OOM killed") log.Warn("Chaos process OOM killed")
return nil return nil
} }
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("podName: %s, namespace: %s, container: %s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: fmt.Sprintf("failed to stress memory of target pod: %s", err.Error())}
} }
case <-signChan: case <-signChan:
log.Info("[Chaos]: Revert Started") log.Info("[Chaos]: Revert Started")
if err := killStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil { if err := killStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err) log.Errorf("Error in Kill stress after abortion, err: %v", err)
} }
// updating the chaosresult after stopped // updating the chaosresult after stopped
failStep := "Chaos injection stopped!" err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: "experiment is aborted"}
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep) failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT") types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed") log.Info("[Chaos]: Revert Completed")
os.Exit(1) os.Exit(1)
case <-endTime: case <-endTime:
@ -167,8 +201,8 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
break loop break loop
} }
} }
if err := killStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil { if err := killStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not revert memory stress")
} }
} }
} }
@ -176,13 +210,15 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
// injectChaosInParallelMode stressed the memory of all target application in parallel mode (all at once) // injectChaosInParallelMode stressed the memory of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodMemoryHogExecFaultInParallelMode")
defer span.End()
// creating err channel to receive the error from the go routine // creating err channel to receive the error from the go routine
stressErr := make(chan error) stressErr := make(chan error)
var err error
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -212,10 +248,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Get the target container name of the application pod //Get the target container name of the application pod
//It checks the empty target container for the first iteration only //It checks the empty target container for the first iteration only
if !experimentsDetails.IsTargetContainerProvided { if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{ log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
@ -224,7 +257,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"Memory Consumption(MB)": experimentsDetails.MemoryConsumption, "Memory Consumption(MB)": experimentsDetails.MemoryConsumption,
}) })
go stressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, clients, stressErr) go stressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, pod.Namespace, clients, stressErr)
} }
} }
@ -243,13 +276,20 @@ loop:
log.Warn("Chaos process OOM killed") log.Warn("Chaos process OOM killed")
return nil return nil
} }
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to stress memory of target pod: %s", err.Error())}
} }
case <-signChan: case <-signChan:
log.Info("[Chaos]: Revert Started") log.Info("[Chaos]: Revert Started")
if err := killStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil { if err := killStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err) log.Errorf("Error in Kill stress after abortion, err: %v", err)
} }
// updating the chaosresult after stopped
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed") log.Info("[Chaos]: Revert Completed")
os.Exit(1) os.Exit(1)
case <-endTime: case <-endTime:
@ -257,36 +297,12 @@ loop:
break loop break loop
} }
} }
return killStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients, chaosDetails) return killStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients, chaosDetails)
}
//PrepareMemoryExecStress contains the chaos prepration and injection steps
func PrepareMemoryExecStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Memory stress experiment
if err := experimentMemory(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
} }
// killStressMemorySerial function to kill a stress process running inside target container // killStressMemorySerial function to kill a stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment //
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressMemorySerial(containerName, podName, namespace, memFreeCmd string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error { func killStressMemorySerial(containerName, podName, namespace, memFreeCmd string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// It will contains all the pod & container details required for exec command // It will contains all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{} execCommandDetails := litmusexec.PodDetails{}
@ -294,9 +310,9 @@ func killStressMemorySerial(containerName, podName, namespace, memFreeCmd string
command := []string{"/bin/sh", "-c", memFreeCmd} command := []string{"/bin/sh", "-c", memFreeCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace) litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, err := litmusexec.Exec(&execCommandDetails, clients, command) out, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil { if err != nil {
return errors.Errorf("Unable to kill stress process inside target container, err: %v", err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, namespace), Reason: fmt.Sprintf("failed to revert chaos: %s", out)}
} }
common.SetTargets(podName, "reverted", "pod", chaosDetails) common.SetTargets(podName, "reverted", "pod", chaosDetails)
return nil return nil
@ -304,13 +320,15 @@ func killStressMemorySerial(containerName, podName, namespace, memFreeCmd string
// killStressMemoryParallel function to kill all the stress process running inside target container // killStressMemoryParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment // Triggered by either timeout of chaos duration or termination of the experiment
func killStressMemoryParallel(containerName string, targetPodList corev1.PodList, namespace, memFreeCmd string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error { func killStressMemoryParallel(containerName string, targetPodList corev1.PodList, memFreeCmd string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
var errList []string
for _, pod := range targetPodList.Items { for _, pod := range targetPodList.Items {
if err := killStressMemorySerial(containerName, pod.Name, pod.Namespace, memFreeCmd, clients, chaosDetails); err != nil {
if err := killStressMemorySerial(containerName, pod.Name, namespace, memFreeCmd, clients, chaosDetails); err != nil { errList = append(errList, err.Error())
return err
} }
} }
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil return nil
} }

View File

@ -1,12 +1,14 @@
package lib package lib
import ( import (
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/palantir/stacktrace"
"strings" "strings"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib" network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-network-partition/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-network-partition/types"
"github.com/pkg/errors"
"gopkg.in/yaml.v2" "gopkg.in/yaml.v2"
corev1 "k8s.io/api/core/v1" corev1 "k8s.io/api/core/v1"
networkv1 "k8s.io/api/networking/v1" networkv1 "k8s.io/api/networking/v1"
@ -52,12 +54,12 @@ func (np *NetworkPolicy) getNetworkPolicyDetails(experimentsDetails *experimentT
// sets the ports for the traffic control // sets the ports for the traffic control
if err := np.setPort(experimentsDetails.PORTS); err != nil { if err := np.setPort(experimentsDetails.PORTS); err != nil {
return err return stacktrace.Propagate(err, "could not set port")
} }
// sets the destination ips for which the traffic should be blocked // sets the destination ips for which the traffic should be blocked
if err := np.setExceptIPs(experimentsDetails); err != nil { if err := np.setExceptIPs(experimentsDetails); err != nil {
return err return stacktrace.Propagate(err, "could not set ips")
} }
// sets the egress traffic rules // sets the egress traffic rules
@ -138,11 +140,11 @@ func (np *NetworkPolicy) setNamespaceSelector(nsLabel string) *NetworkPolicy {
// setPort sets all the protocols and ports // setPort sets all the protocols and ports
func (np *NetworkPolicy) setPort(p string) error { func (np *NetworkPolicy) setPort(p string) error {
ports := []networkv1.NetworkPolicyPort{} var ports []networkv1.NetworkPolicyPort
var port Port var port Port
// unmarshal the protocols and ports from the env // unmarshal the protocols and ports from the env
if err := yaml.Unmarshal([]byte(strings.TrimSpace(parseCommand(p))), &port); err != nil { if err := yaml.Unmarshal([]byte(strings.TrimSpace(parseCommand(p))), &port); err != nil {
return errors.Errorf("Unable to unmarshal, err: %v", err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("failed to unmarshal ports: %s", err.Error())}
} }
// sets all the tcp ports // sets all the tcp ports
@ -182,7 +184,7 @@ func (np *NetworkPolicy) setExceptIPs(experimentsDetails *experimentTypes.Experi
// get all the target ips // get all the target ips
destinationIPs, err := network_chaos.GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients.ClientSets{}, false) destinationIPs, err := network_chaos.GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients.ClientSets{}, false)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get destination ips")
} }
ips := strings.Split(destinationIPs, ",") ips := strings.Split(destinationIPs, ",")

View File

@ -2,11 +2,18 @@ package lib
import ( import (
"context" "context"
"fmt"
"os" "os"
"os/signal" "os/signal"
"strings"
"syscall" "syscall"
"time" "time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-network-partition/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-network-partition/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
@ -15,7 +22,7 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry" "github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors" "github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1" corev1 "k8s.io/api/core/v1"
networkv1 "k8s.io/api/networking/v1" networkv1 "k8s.io/api/networking/v1"
@ -26,8 +33,10 @@ var (
inject, abort chan os.Signal inject, abort chan os.Signal
) )
//PrepareAndInjectChaos contains the prepration & injection steps // PrepareAndInjectChaos contains the prepration & injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkPartitionFault")
defer span.End()
// inject channel is used to transmit signal notifications. // inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
@ -40,13 +49,14 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
signal.Notify(abort, os.Interrupt, syscall.SIGTERM) signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// validate the appLabels // validate the appLabels
if chaosDetails.AppDetail.Label == "" { if chaosDetails.AppDetail == nil {
return errors.Errorf("please provide the appLabel") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide the appLabel"}
} }
// Get the target pod details for the chaos execution // Get the target pod details for the chaos execution
targetPodList, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).List(context.Background(), v1.ListOptions{LabelSelector: experimentsDetails.AppLabel}) targetPodList, err := common.GetPodList("", 100, clients, chaosDetails)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get target pods")
} }
podNames := []string{} podNames := []string{}
@ -56,7 +66,7 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
log.Infof("Target pods list for chaos, %v", podNames) log.Infof("Target pods list for chaos, %v", podNames)
// generate a unique string // generate a unique string
runID := common.GetRunID() runID := stringutils.GetRunID()
//Waiting for the ramp time before chaos injection //Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
@ -67,7 +77,7 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
// collect all the data for the network policy // collect all the data for the network policy
np := initialize() np := initialize()
if err := np.getNetworkPolicyDetails(experimentsDetails); err != nil { if err := np.getNetworkPolicyDetails(experimentsDetails); err != nil {
return err return stacktrace.Propagate(err, "could not get network policy details")
} }
//DISPLAY THE NETWORK POLICY DETAILS //DISPLAY THE NETWORK POLICY DETAILS
@ -81,11 +91,11 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
}) })
// watching for the abort signal and revert the chaos // watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, clients, chaosDetails, resultDetails, targetPodList, runID) go abortWatcher(experimentsDetails, clients, chaosDetails, resultDetails, &targetPodList, runID)
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -96,8 +106,8 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
os.Exit(0) os.Exit(0)
default: default:
// creating the network policy to block the traffic // creating the network policy to block the traffic
if err := createNetworkPolicy(experimentsDetails, clients, np, runID); err != nil { if err := createNetworkPolicy(ctx, experimentsDetails, clients, np, runID); err != nil {
return err return stacktrace.Propagate(err, "could not create network policy")
} }
// updating chaos status to injected for the target pods // updating chaos status to injected for the target pods
for _, pod := range targetPodList.Items { for _, pod := range targetPodList.Items {
@ -106,16 +116,16 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
} }
// verify the presence of network policy inside cluster // verify the presence of network policy inside cluster
if err := checkExistanceOfPolicy(experimentsDetails, clients, experimentsDetails.Timeout, experimentsDetails.Delay, runID); err != nil { if err := checkExistenceOfPolicy(experimentsDetails, clients, experimentsDetails.Timeout, experimentsDetails.Delay, runID); err != nil {
return err return stacktrace.Propagate(err, "could not check existence of network policy")
} }
log.Infof("[Wait]: Wait for %v chaos duration", experimentsDetails.ChaosDuration) log.Infof("[Wait]: Wait for %v chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration) common.WaitForDuration(experimentsDetails.ChaosDuration)
// deleting the network policy after chaos duration over // deleting the network policy after chaos duration over
if err := deleteNetworkPolicy(experimentsDetails, clients, targetPodList, chaosDetails, experimentsDetails.Timeout, experimentsDetails.Delay, runID); err != nil { if err := deleteNetworkPolicy(experimentsDetails, clients, &targetPodList, chaosDetails, experimentsDetails.Timeout, experimentsDetails.Delay, runID); err != nil {
return err return stacktrace.Propagate(err, "could not delete network policy")
} }
// updating chaos status to reverted for the target pods // updating chaos status to reverted for the target pods
@ -134,7 +144,9 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
// createNetworkPolicy creates the network policy in the application namespace // createNetworkPolicy creates the network policy in the application namespace
// it blocks ingress/egress traffic for the targeted application for specific/all IPs // it blocks ingress/egress traffic for the targeted application for specific/all IPs
func createNetworkPolicy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, networkPolicy *NetworkPolicy, runID string) error { func createNetworkPolicy(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, networkPolicy *NetworkPolicy, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodNetworkPartitionFault")
defer span.End()
np := &networkv1.NetworkPolicy{ np := &networkv1.NetworkPolicy{
ObjectMeta: v1.ObjectMeta{ ObjectMeta: v1.ObjectMeta{
@ -157,7 +169,10 @@ func createNetworkPolicy(experimentsDetails *experimentTypes.ExperimentDetails,
} }
_, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).Create(context.Background(), np, v1.CreateOptions{}) _, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).Create(context.Background(), np, v1.CreateOptions{})
return err if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to create network policy: %s", err.Error())}
}
return nil
} }
// deleteNetworkPolicy deletes the network policy and wait until the network policy deleted completely // deleteNetworkPolicy deletes the network policy and wait until the network policy deleted completely
@ -165,7 +180,7 @@ func deleteNetworkPolicy(experimentsDetails *experimentTypes.ExperimentDetails,
name := experimentsDetails.ExperimentName + "-np-" + runID name := experimentsDetails.ExperimentName + "-np-" + runID
labels := "name=" + experimentsDetails.ExperimentName + "-np-" + runID labels := "name=" + experimentsDetails.ExperimentName + "-np-" + runID
if err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).Delete(context.Background(), name, v1.DeleteOptions{}); err != nil { if err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).Delete(context.Background(), name, v1.DeleteOptions{}); err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{name: %s, namespace: %s}", name, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to delete network policy: %s", err.Error())}
} }
err := retry. err := retry.
@ -173,8 +188,10 @@ func deleteNetworkPolicy(experimentsDetails *experimentTypes.ExperimentDetails,
Wait(time.Duration(delay) * time.Second). Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error { Try(func(attempt uint) error {
npList, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).List(context.Background(), v1.ListOptions{LabelSelector: labels}) npList, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).List(context.Background(), v1.ListOptions{LabelSelector: labels})
if err != nil || len(npList.Items) != 0 { if err != nil {
return errors.Errorf("Unable to delete the network policy, err: %v", err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to list network policies: %s", err.Error())}
} else if len(npList.Items) != 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: "network policies are not deleted within timeout"}
} }
return nil return nil
}) })
@ -189,8 +206,8 @@ func deleteNetworkPolicy(experimentsDetails *experimentTypes.ExperimentDetails,
return nil return nil
} }
// checkExistanceOfPolicy validate the presence of network policy inside the application namespace // checkExistenceOfPolicy validate the presence of network policy inside the application namespace
func checkExistanceOfPolicy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, timeout, delay int, runID string) error { func checkExistenceOfPolicy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, timeout, delay int, runID string) error {
labels := "name=" + experimentsDetails.ExperimentName + "-np-" + runID labels := "name=" + experimentsDetails.ExperimentName + "-np-" + runID
return retry. return retry.
@ -198,8 +215,10 @@ func checkExistanceOfPolicy(experimentsDetails *experimentTypes.ExperimentDetail
Wait(time.Duration(delay) * time.Second). Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error { Try(func(attempt uint) error {
npList, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).List(context.Background(), v1.ListOptions{LabelSelector: labels}) npList, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).List(context.Background(), v1.ListOptions{LabelSelector: labels})
if err != nil || len(npList.Items) == 0 { if err != nil {
return errors.Errorf("no network policy found, err: %v", err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to list network policies: %s", err.Error())}
} else if len(npList.Items) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: "no network policy found with matching labels"}
} }
return nil return nil
}) })
@ -215,8 +234,13 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients
// retry thrice for the chaos revert // retry thrice for the chaos revert
retry := 3 retry := 3
for retry > 0 { for retry > 0 {
if err := checkExistanceOfPolicy(experimentsDetails, clients, 2, 1, runID); err != nil { if err := checkExistenceOfPolicy(experimentsDetails, clients, 2, 1, runID); err != nil {
log.Infof("no active network policy found, err: %v", err) if error, ok := err.(cerrors.Error); ok {
if strings.Contains(error.Reason, "no network policy found with matching labels") {
break
}
}
log.Infof("no active network policy found, err: %v", err.Error())
retry-- retry--
continue continue
} }
@ -224,10 +248,12 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients
if err := deleteNetworkPolicy(experimentsDetails, clients, targetPodList, chaosDetails, 2, 1, runID); err != nil { if err := deleteNetworkPolicy(experimentsDetails, clients, targetPodList, chaosDetails, 2, 1, runID); err != nil {
log.Errorf("unable to delete network policy, err: %v", err) log.Errorf("unable to delete network policy, err: %v", err)
} }
retry--
} }
// updating the chaosresult after stopped // updating the chaosresult after stopped
failStep := "Chaos injection stopped!" err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep) failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT") result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
log.Info("Chaos Revert Completed") log.Info("Chaos Revert Completed")
os.Exit(0) os.Exit(0)

View File

@ -0,0 +1,260 @@
package lib
import (
"fmt"
"go.opentelemetry.io/otel"
"golang.org/x/net/context"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/rds"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/rds-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
)
var (
err error
inject, abort chan os.Signal
)
func PrepareRDSInstanceStop(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareRDSInstanceStop")
defer span.End()
// Inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// Abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Get the instance identifier or list of instance identifiers
instanceIdentifierList := strings.Split(experimentsDetails.RDSInstanceIdentifier, ",")
if experimentsDetails.RDSInstanceIdentifier == "" || len(instanceIdentifierList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no RDS instance identifier found to stop"}
}
instanceIdentifierList = common.FilterBasedOnPercentage(experimentsDetails.InstanceAffectedPerc, instanceIdentifierList)
log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceIdentifierList))
// Watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, instanceIdentifierList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceIdentifierList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceIdentifierList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the rds instance state in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
select {
case <-inject:
// Stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instance identifier list, %v", instanceIdentifierList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on rds instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i, identifier := range instanceIdentifierList {
// Stopping the RDS instance
log.Info("[Chaos]: Stopping the desired RDS instance")
if err := awslib.RDSInstanceStop(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "injected", "RDS", chaosDetails)
// Wait for rds instance to completely stop
log.Infof("[Wait]: Wait for RDS instance '%v' to get in stopped state", identifier)
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
// Run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
// Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
// Starting the RDS instance
log.Info("[Chaos]: Starting back the RDS instance")
if err = awslib.RDSInstanceStart(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
// Wait for rds instance to get in available state
log.Infof("[Wait]: Wait for RDS instance '%v' to get in available state", identifier)
if err := awslib.WaitForRDSInstanceUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode will inject the rds instance termination in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instance identifier list, %v", instanceIdentifierList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on rds instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// PowerOff the instance
for _, identifier := range instanceIdentifierList {
// Stopping the RDS instance
log.Info("[Chaos]: Stopping the desired RDS instance")
if err := awslib.RDSInstanceStop(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "injected", "RDS", chaosDetails)
}
for _, identifier := range instanceIdentifierList {
// Wait for rds instance to completely stop
log.Infof("[Wait]: Wait for RDS instance '%v' to get in stopped state", identifier)
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
// Run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
// Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
// Starting the RDS instance
for _, identifier := range instanceIdentifierList {
log.Info("[Chaos]: Starting back the RDS instance")
if err = awslib.RDSInstanceStart(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
}
for _, identifier := range instanceIdentifierList {
// Wait for rds instance to get in available state
log.Infof("[Wait]: Wait for RDS instance '%v' to get in available state", identifier)
if err := awslib.WaitForRDSInstanceUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
}
for _, identifier := range instanceIdentifierList {
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// watching for the abort signal and revert the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for _, identifier := range instanceIdentifierList {
instanceState, err := awslib.GetRDSInstanceStatus(identifier, experimentsDetails.Region)
if err != nil {
log.Errorf("Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "running" {
log.Info("[Abort]: Waiting for the RDS instance to get down")
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
log.Errorf("Unable to wait till stop of the instance: %v", err)
}
log.Info("[Abort]: Starting RDS instance as abort signal received")
err := awslib.RDSInstanceStart(identifier, experimentsDetails.Region)
if err != nil {
log.Errorf("RDS instance failed to start when an abort signal is received: %v", err)
}
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -1,31 +1,38 @@
package lib package lib
import ( import (
"context"
"fmt" "fmt"
"time" "time"
redfishLib "github.com/litmuschaos/litmus-go/pkg/baremetal/redfish" redfishLib "github.com/litmuschaos/litmus-go/pkg/baremetal/redfish"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/baremetal/redfish-node-restart/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/baremetal/redfish-node-restart/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
) )
//injectChaos initiates node restart chaos on the target node // injectChaos initiates node restart chaos on the target node
func injectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error { func injectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectRedfishNodeRestartFault")
defer span.End()
URL := fmt.Sprintf("https://%v/redfish/v1/Systems/System.Embedded.1/Actions/ComputerSystem.Reset", experimentsDetails.IPMIIP) URL := fmt.Sprintf("https://%v/redfish/v1/Systems/System.Embedded.1/Actions/ComputerSystem.Reset", experimentsDetails.IPMIIP)
return redfishLib.RebootNode(URL, experimentsDetails.User, experimentsDetails.Password) return redfishLib.RebootNode(URL, experimentsDetails.User, experimentsDetails.Password)
} }
//experimentExecution function orchestrates the experiment by calling the injectChaos function // experimentExecution function orchestrates the experiment by calling the injectChaos function
func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -36,17 +43,19 @@ func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails,
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine") events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
} }
if err := injectChaos(experimentsDetails, clients); err != nil { if err := injectChaos(ctx, experimentsDetails, clients); err != nil {
return err return stacktrace.Propagate(err, "chaos injection failed")
} }
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration) log.Infof("[Chaos]: Waiting for: %vs", experimentsDetails.ChaosDuration)
time.Sleep(time.Duration(experimentsDetails.ChaosDuration) * time.Second) time.Sleep(time.Duration(experimentsDetails.ChaosDuration) * time.Second)
return nil return nil
} }
//PrepareChaos contains the chaos prepration and injection steps // PrepareChaos contains the chaos prepration and injection steps
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareRedfishNodeRestartFault")
defer span.End()
//Waiting for the ramp time before chaos injection //Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
@ -54,7 +63,7 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
common.WaitForDuration(experimentsDetails.RampTime) common.WaitForDuration(experimentsDetails.RampTime)
} }
//Starting the Redfish node restart experiment //Starting the Redfish node restart experiment
if err := experimentExecution(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return err
} }
common.SetTargets(experimentsDetails.IPMIIP, "targeted", "node", chaosDetails) common.SetTargets(experimentsDetails.IPMIIP, "targeted", "node", chaosDetails)

View File

@ -2,9 +2,9 @@ package lib
import ( import (
"bytes" "bytes"
"context"
"encoding/json" "encoding/json"
"fmt" "fmt"
corev1 "k8s.io/api/core/v1"
"net/http" "net/http"
"os" "os"
"os/signal" "os/signal"
@ -12,6 +12,12 @@ import (
"syscall" "syscall"
"time" "time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
corev1 "k8s.io/api/core/v1"
"github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
@ -20,7 +26,6 @@ import (
experimentTypes "github.com/litmuschaos/litmus-go/pkg/spring-boot/spring-boot-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/spring-boot/spring-boot-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
) )
@ -38,8 +43,8 @@ func SetTargetPodList(experimentsDetails *experimentTypes.ExperimentDetails, cli
// if the target pod is not defined it will derive the random target pod list using pod affected percentage // if the target pod is not defined it will derive the random target pod list using pod affected percentage
var err error var err error
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" { if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or TARGET_PODS"}
} }
if experimentsDetails.TargetPodList, err = common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails); err != nil { if experimentsDetails.TargetPodList, err = common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails); err != nil {
return err return err
@ -49,7 +54,10 @@ func SetTargetPodList(experimentsDetails *experimentTypes.ExperimentDetails, cli
} }
// PrepareChaos contains the preparation steps before chaos injection // PrepareChaos contains the preparation steps before chaos injection
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareSpringBootFault")
defer span.End()
// Waiting for the ramp time before chaos injection // Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime) log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
@ -64,25 +72,18 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
"Controller": experimentsDetails.ChaosMonkeyWatchers.Controller, "Controller": experimentsDetails.ChaosMonkeyWatchers.Controller,
"RestController": experimentsDetails.ChaosMonkeyWatchers.RestController, "RestController": experimentsDetails.ChaosMonkeyWatchers.RestController,
}) })
log.InfoWithValues("[Info]: Chaos monkeys assaults will be injected to the target pods as follows", logrus.Fields{
"CPU Assault": experimentsDetails.ChaosMonkeyAssault.CPUActive,
"Memory Assault": experimentsDetails.ChaosMonkeyAssault.MemoryActive,
"Kill App Assault": experimentsDetails.ChaosMonkeyAssault.KillApplicationActive,
"Latency Assault": experimentsDetails.ChaosMonkeyAssault.LatencyActive,
"Exception Assault": experimentsDetails.ChaosMonkeyAssault.ExceptionsActive,
})
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err := injectChaosInSerialMode(experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil { if err := injectChaosInSerialMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err := injectChaosInParallelMode(experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil { if err := injectChaosInParallelMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
// Waiting for the ramp time after chaos injection // Waiting for the ramp time after chaos injection
@ -98,25 +99,30 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
func CheckChaosMonkey(chaosMonkeyPort string, chaosMonkeyPath string, targetPods corev1.PodList) (bool, error) { func CheckChaosMonkey(chaosMonkeyPort string, chaosMonkeyPath string, targetPods corev1.PodList) (bool, error) {
hasErrors := false hasErrors := false
targetPodNames := []string{}
for _, pod := range targetPods.Items { for _, pod := range targetPods.Items {
targetPodNames = append(targetPodNames, pod.Name)
endpoint := "http://" + pod.Status.PodIP + ":" + chaosMonkeyPort + chaosMonkeyPath endpoint := "http://" + pod.Status.PodIP + ":" + chaosMonkeyPort + chaosMonkeyPath
log.Infof("[Check]: Checking pod: %v (endpoint: %v)", pod.Name, endpoint) log.Infof("[Check]: Checking pod: %v (endpoint: %v)", pod.Name, endpoint)
resp, err := http.Get(endpoint) resp, err := http.Get(endpoint)
if err != nil { if err != nil {
log.Errorf("failed to request chaos monkey endpoint on pod %v (err: %v)", pod.Name, resp.StatusCode) log.Errorf("failed to request chaos monkey endpoint on pod %s, %s", pod.Name, err.Error())
hasErrors = true hasErrors = true
continue continue
} }
if resp.StatusCode != 200 { if resp.StatusCode != 200 {
log.Errorf("failed to get chaos monkey endpoint on pod %v (status: %v)", pod.Name, resp.StatusCode) log.Errorf("failed to get chaos monkey endpoint on pod %s (status: %d)", pod.Name, resp.StatusCode)
hasErrors = true hasErrors = true
} }
} }
if hasErrors { if hasErrors {
return false, errors.Errorf("failed to check chaos moonkey on at least one pod, check logs for details") return false, cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{podNames: %s}", targetPodNames), Reason: "failed to check chaos monkey on at least one pod, check logs for details"}
} }
return true, nil return true, nil
} }
@ -130,7 +136,7 @@ func enableChaosMonkey(chaosMonkeyPort string, chaosMonkeyPath string, pod corev
} }
if resp.StatusCode != 200 { if resp.StatusCode != 200 {
return errors.Errorf("failed to enable chaos monkey endpoint on pod %v (status: %v)", pod.Name, resp.StatusCode) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to enable chaos monkey endpoint (status: %d)", resp.StatusCode)}
} }
return nil return nil
@ -141,37 +147,33 @@ func setChaosMonkeyWatchers(chaosMonkeyPort string, chaosMonkeyPath string, watc
jsonValue, err := json.Marshal(watchers) jsonValue, err := json.Marshal(watchers)
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to marshal chaos monkey watchers, %s", err.Error())}
} }
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/watchers", "application/json", bytes.NewBuffer(jsonValue)) resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/watchers", "application/json", bytes.NewBuffer(jsonValue))
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to set watchers, %s", err.Error())}
} }
if resp.StatusCode != 200 { if resp.StatusCode != 200 {
return errors.Errorf("failed to set assault on pod %v (status: %v)", pod.Name, resp.StatusCode) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to set assault (status: %d)", resp.StatusCode)}
} }
return nil return nil
} }
func startAssault(chaosMonkeyPort string, chaosMonkeyPath string, assault experimentTypes.ChaosMonkeyAssault, pod corev1.Pod) error { func startAssault(chaosMonkeyPort string, chaosMonkeyPath string, assault []byte, pod corev1.Pod) error {
jsonValue, err := json.Marshal(assault) if err := setChaosMonkeyAssault(chaosMonkeyPort, chaosMonkeyPath, assault, pod); err != nil {
if err != nil {
return err
}
if err := setChaosMonkeyAssault(chaosMonkeyPort, chaosMonkeyPath, jsonValue, pod); err != nil {
return err return err
} }
log.Infof("[Chaos]: Activating Chaos Monkey assault on pod: %v", pod.Name) log.Infof("[Chaos]: Activating Chaos Monkey assault on pod: %v", pod.Name)
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/assaults/runtime/attack", "", nil) resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/assaults/runtime/attack", "", nil)
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to start assault %s", err.Error())}
} }
if resp.StatusCode != 200 { if resp.StatusCode != 200 {
return errors.Errorf("failed to activate runtime attack on pod %v (status: %v)", pod.Name, resp.StatusCode) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to activate runtime attack (status: %d)", resp.StatusCode)}
} }
return nil return nil
} }
@ -181,45 +183,47 @@ func setChaosMonkeyAssault(chaosMonkeyPort string, chaosMonkeyPath string, assau
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/assaults", "application/json", bytes.NewBuffer(assault)) resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/assaults", "application/json", bytes.NewBuffer(assault))
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to set assault, %s", err.Error())}
} }
if resp.StatusCode != 200 { if resp.StatusCode != 200 {
return errors.Errorf("failed to set assault on pod %v (status: %v)", pod.Name, resp.StatusCode) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to set assault (status: %d)", resp.StatusCode)}
} }
return nil return nil
} }
// disableChaosMonkey disables chaos monkey on selected pods // disableChaosMonkey disables chaos monkey on selected pods
func disableChaosMonkey(chaosMonkeyPort string, chaosMonkeyPath string, pod corev1.Pod) error { func disableChaosMonkey(ctx context.Context, chaosMonkeyPort string, chaosMonkeyPath string, pod corev1.Pod) error {
log.Infof("[Chaos]: disabling assaults on pod %v", pod.Name) log.Infof("[Chaos]: disabling assaults on pod %s", pod.Name)
jsonValue, err := json.Marshal(revertAssault) jsonValue, err := json.Marshal(revertAssault)
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to marshal chaos monkey revert-chaos watchers, %s", err.Error())}
} }
if err := setChaosMonkeyAssault(chaosMonkeyPort, chaosMonkeyPath, jsonValue, pod); err != nil { if err := setChaosMonkeyAssault(chaosMonkeyPort, chaosMonkeyPath, jsonValue, pod); err != nil {
return err return err
} }
log.Infof("[Chaos]: disabling chaos monkey on pod %v", pod.Name) log.Infof("[Chaos]: disabling chaos monkey on pod %s", pod.Name)
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/disable", "", nil) resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/disable", "", nil)
if err != nil { if err != nil {
return err return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to disable assault, %s", err.Error())}
} }
if resp.StatusCode != 200 { if resp.StatusCode != 200 {
return errors.Errorf("failed to disable chaos monkey endpoint on pod %v (status: %v)", pod.Name, resp.StatusCode) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to disable chaos monkey endpoint (status: %d)", resp.StatusCode)}
} }
return nil return nil
} }
// injectChaosInSerialMode injects chaos monkey assault on pods in serial mode(one by one) // injectChaosInSerialMode injects chaos monkey assault on pods in serial mode(one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectSpringBootFaultInSerialMode")
defer span.End()
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -273,14 +277,14 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
select { select {
case <-signChan: case <-signChan:
log.Info("[Chaos]: Revert Started") log.Info("[Chaos]: Revert Started")
if err := disableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil { if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("Error in disabling chaos monkey, err: %v", err) log.Errorf("Error in disabling chaos monkey, err: %v", err)
} else { } else {
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails) common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
} }
// updating the chaosresult after stopped // updating the chaosresult after stopped
failStep := "Chaos injection stopped!" failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep) types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, cerrors.ErrorTypeExperimentAborted)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT") result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
log.Info("[Chaos]: Revert Completed") log.Info("[Chaos]: Revert Completed")
os.Exit(1) os.Exit(1)
@ -291,8 +295,8 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
} }
if err := disableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil { if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
return fmt.Errorf("error in disabling chaos monkey, err: %v", err) return err
} }
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails) common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
@ -303,11 +307,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
// injectChaosInParallelMode injects chaos monkey assault on pods in parallel mode (all at once) // injectChaosInParallelMode injects chaos monkey assault on pods in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectSpringBootFaultInParallelMode")
defer span.End()
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -338,16 +344,17 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
}) })
if err := setChaosMonkeyWatchers(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyWatchers, pod); err != nil { if err := setChaosMonkeyWatchers(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyWatchers, pod); err != nil {
return errors.Errorf("[Chaos]: Failed to set watchers, err: %v ", err) log.Errorf("[Chaos]: Failed to set watchers, err: %v", err)
return err
} }
if err := startAssault(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyAssault, pod); err != nil { if err := startAssault(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyAssault, pod); err != nil {
log.Errorf("[Chaos]: Failed to set assault, err: %v ", err) log.Errorf("[Chaos]: Failed to set assault, err: %v", err)
return err return err
} }
if err := enableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil { if err := enableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("[Chaos]: Failed to enable chaos, err: %v ", err) log.Errorf("[Chaos]: Failed to enable chaos, err: %v", err)
return err return err
} }
common.SetTargets(pod.Name, "injected", "pod", chaosDetails) common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
@ -361,7 +368,7 @@ loop:
case <-signChan: case <-signChan:
log.Info("[Chaos]: Revert Started") log.Info("[Chaos]: Revert Started")
for _, pod := range experimentsDetails.TargetPodList.Items { for _, pod := range experimentsDetails.TargetPodList.Items {
if err := disableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil { if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("Error in disabling chaos monkey, err: %v", err) log.Errorf("Error in disabling chaos monkey, err: %v", err)
} else { } else {
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails) common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
@ -369,7 +376,7 @@ loop:
} }
// updating the chaosresult after stopped // updating the chaosresult after stopped
failStep := "Chaos injection stopped!" failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep) types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, cerrors.ErrorTypeExperimentAborted)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT") result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
log.Info("[Chaos]: Revert Completed") log.Info("[Chaos]: Revert Completed")
os.Exit(1) os.Exit(1)
@ -382,7 +389,7 @@ loop:
var errorList []string var errorList []string
for _, pod := range experimentsDetails.TargetPodList.Items { for _, pod := range experimentsDetails.TargetPodList.Items {
if err := disableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil { if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
errorList = append(errorList, err.Error()) errorList = append(errorList, err.Error())
continue continue
} }
@ -390,7 +397,7 @@ loop:
} }
if len(errorList) != 0 { if len(errorList) != 0 {
return fmt.Errorf("error in disabling chaos monkey, err: %v", strings.Join(errorList, ", ")) return cerrors.PreserveError{ErrString: fmt.Sprintf("error in disabling chaos monkey, [%s]", strings.Join(errorList, ","))}
} }
return nil return nil
} }

View File

@ -3,6 +3,7 @@ package helper
import ( import (
"bufio" "bufio"
"bytes" "bytes"
"context"
"fmt" "fmt"
"io" "io"
"os" "os"
@ -16,19 +17,26 @@ import (
"github.com/containerd/cgroups" "github.com/containerd/cgroups"
cgroupsv2 "github.com/containerd/cgroups/v2" cgroupsv2 "github.com/containerd/cgroups/v2"
"github.com/palantir/stacktrace"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientTypes "k8s.io/apimachinery/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
clients "github.com/litmuschaos/litmus-go/pkg/clients" clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result" "github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
clientTypes "k8s.io/apimachinery/pkg/types"
) )
//list of cgroups in a container // list of cgroups in a container
var ( var (
cgroupSubsystemList = []string{"cpu", "memory", "systemd", "net_cls", cgroupSubsystemList = []string{"cpu", "memory", "systemd", "net_cls",
"net_prio", "freezer", "blkio", "perf_event", "devices", "cpuset", "net_prio", "freezer", "blkio", "perf_event", "devices", "cpuset",
@ -49,7 +57,9 @@ const (
) )
// Helper injects the stress chaos // Helper injects the stress chaos
func Helper(clients clients.ClientSets) { func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodStressFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{} experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{} eventsDetails := types.EventDetails{}
@ -72,6 +82,7 @@ func Helper(clients clients.ClientSets) {
// Intialise the chaos attributes // Intialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails) types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters // Intialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails) types.SetResultAttributes(&resultDetails, chaosDetails)
@ -80,145 +91,260 @@ func Helper(clients clients.ClientSets) {
result.SetResultUID(&resultDetails, clients, &chaosDetails) result.SetResultUID(&resultDetails, clients, &chaosDetails)
if err := prepareStressChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil { if err := prepareStressChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err) log.Fatalf("helper pod failed, err: %v", err)
} }
} }
//prepareStressChaos contains the chaos preparation and injection steps // prepareStressChaos contains the chaos preparation and injection steps
func prepareStressChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error { func prepareStressChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
// get stressors in list format
stressorList := prepareStressor(experimentsDetails)
if len(stressorList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: "fail to prepare stressors"}
}
stressors := strings.Join(stressorList, " ")
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not parse targets")
}
var targets []*targetDetails
for _, t := range targetList.Target {
td := &targetDetails{
Name: t.Name,
Namespace: t.Namespace,
Source: chaosDetails.ChaosPodName,
}
td.TargetContainers, err = common.GetTargetContainers(t.Name, t.Namespace, t.TargetContainer, chaosDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get target containers")
}
td.ContainerIds, err = common.GetContainerIDs(td.Namespace, td.Name, td.TargetContainers, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container ids")
}
for _, cid := range td.ContainerIds {
// extract out the pid of the target container
pid, err := common.GetPID(experimentsDetails.ContainerRuntime, cid, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
}
td.Pids = append(td.Pids, pid)
}
for i := range td.Pids {
cGroupManagers, err, grpPath := getCGroupManager(td, i)
if err != nil {
return stacktrace.Propagate(err, "could not get cgroup manager")
}
td.GroupPath = grpPath
td.CGroupManagers = append(td.CGroupManagers, cGroupManagers)
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": td.Name,
"Namespace": td.Namespace,
"TargetContainers": td.TargetContainers,
})
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos if an abort signal is received
go abortWatcher(targets, resultDetails.Name, chaosDetails.ChaosNamespace)
select { select {
case <-inject: case <-inject:
// stopping the chaos execution, if abort signal received // stopping the chaos execution, if abort signal received
os.Exit(1) os.Exit(1)
default: default:
}
containerID, err := common.GetContainerID(experimentsDetails.AppNS, experimentsDetails.TargetPods, experimentsDetails.TargetContainer, clients) done := make(chan error, 1)
if err != nil {
return err
}
// extract out the pid of the target container
targetPID, err := common.GetPID(experimentsDetails.ContainerRuntime, containerID, experimentsDetails.SocketPath)
if err != nil {
return err
}
// record the event inside chaosengine for index, t := range targets {
if experimentsDetails.EngineName != "" { for i := range t.Pids {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod" cmd, err := injectChaos(t, stressors, i, experimentsDetails.StressType)
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
cgroupManager, err := getCGroupManager(int(targetPID), containerID)
if err != nil {
return errors.Errorf("fail to get the cgroup manager, err: %v", err)
}
// get stressors in list format
stressorList := prepareStressor(experimentsDetails)
if len(stressorList) == 0 {
return errors.Errorf("fail to prepare stressor for %v experiment", experimentsDetails.ExperimentName)
}
stressors := strings.Join(stressorList, " ")
stressCommand := "pause nsutil -t " + strconv.Itoa(targetPID) + " -p -- " + stressors
log.Infof("[Info]: starting process: %v", stressCommand)
// launch the stress-ng process on the target container in paused mode
cmd := exec.Command("/bin/bash", "-c", stressCommand)
// enables the process group id
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
var buf bytes.Buffer
cmd.Stdout = &buf
err = cmd.Start()
if err != nil {
return errors.Errorf("fail to start the stress process %v, err: %v", stressCommand, err)
}
// watching for the abort signal and revert the chaos if an abort signal is received
go abortWatcher(cmd.Process.Pid, resultDetails.Name, chaosDetails.ChaosNamespace, experimentsDetails.TargetPods)
// add the stress process to the cgroup of target container
if err = addProcessToCgroup(cmd.Process.Pid, cgroupManager); err != nil {
if killErr := cmd.Process.Kill(); killErr != nil {
return errors.Errorf("stressors failed killing %v process, err: %v", cmd.Process.Pid, killErr)
}
return errors.Errorf("fail to add the stress process into target container cgroup, err: %v", err)
}
log.Info("[Info]: Sending signal to resume the stress process")
// wait for the process to start before sending the resume signal
// TODO: need a dynamic way to check the start of the process
time.Sleep(700 * time.Millisecond)
// remove pause and resume or start the stress process
if err := cmd.Process.Signal(syscall.SIGCONT); err != nil {
return errors.Errorf("fail to remove pause and start the stress process: %v", err)
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
log.Info("[Wait]: Waiting for chaos completion")
// channel to check the completion of the stress process
done := make(chan error)
go func() { done <- cmd.Wait() }()
// check the timeout for the command
// Note: timeout will occur when process didn't complete even after 10s of chaos duration
timeout := time.After((time.Duration(experimentsDetails.ChaosDuration) + 30) * time.Second)
select {
case <-timeout:
// the stress process gets timeout before completion
log.Infof("[Chaos] The stress process is not yet completed after the chaos duration of %vs", experimentsDetails.ChaosDuration+30)
log.Info("[Timeout]: Killing the stress process")
if err = terminateProcess(cmd.Process.Pid); err != nil {
return err
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
return nil
case err := <-done:
if err != nil { if err != nil {
err, ok := err.(*exec.ExitError) if revertErr := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, index-1); revertErr != nil {
if ok { return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
status := err.Sys().(syscall.WaitStatus) }
if status.Signaled() && status.Signal() == syscall.SIGKILL { return stacktrace.Propagate(err, "could not inject chaos")
// wait for the completion of abort handler }
time.Sleep(10 * time.Second) targets[index].Cmds = append(targets[index].Cmds, cmd)
return errors.Errorf("process stopped with SIGKILL signal") log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainers[i])
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, index); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.Info("[Wait]: Waiting for chaos completion")
// channel to check the completion of the stress process
go func() {
var errList []string
var exitErr error
for _, t := range targets {
for i := range t.Cmds {
if err := t.Cmds[i].Cmd.Wait(); err != nil {
log.Infof("stress process failed, err: %v, out: %v", err, t.Cmds[i].Buffer.String())
if _, ok := err.(*exec.ExitError); ok {
exitErr = err
continue
}
errList = append(errList, err.Error())
}
}
}
if exitErr != nil {
oomKilled, err := checkOOMKilled(targets, clients, exitErr)
if err != nil {
log.Infof("could not check oomkilled, err: %v", err)
}
if !oomKilled {
done <- exitErr
}
done <- nil
} else if len(errList) != 0 {
oomKilled, err := checkOOMKilled(targets, clients, fmt.Errorf("err: %v", strings.Join(errList, ", ")))
if err != nil {
log.Infof("could not check oomkilled, err: %v", err)
}
if !oomKilled {
done <- fmt.Errorf("err: %v", strings.Join(errList, ", "))
}
done <- nil
} else {
done <- nil
}
}()
// check the timeout for the command
// Note: timeout will occur when process didn't complete even after 10s of chaos duration
timeout := time.After((time.Duration(experimentsDetails.ChaosDuration) + 30) * time.Second)
select {
case <-timeout:
// the stress process gets timeout before completion
log.Infof("[Chaos] The stress process is not yet completed after the chaos duration of %vs", experimentsDetails.ChaosDuration+30)
log.Info("[Timeout]: Killing the stress process")
if err := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
case err := <-done:
if err != nil {
exitErr, ok := err.(*exec.ExitError)
if ok {
status := exitErr.Sys().(syscall.WaitStatus)
if status.Signaled() {
log.Infof("process stopped with signal: %v", status.Signal())
}
if status.Signaled() && status.Signal() == syscall.SIGKILL {
// wait for the completion of abort handler
time.Sleep(10 * time.Second)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Source: chaosDetails.ChaosPodName, Reason: fmt.Sprintf("process stopped with SIGTERM signal")}
}
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: chaosDetails.ChaosPodName, Reason: err.Error()}
}
log.Info("[Info]: Reverting Chaos")
if err := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
}
return nil
}
func revertChaosForAllTargets(targets []*targetDetails, resultDetails *types.ResultDetails, chaosNs string, index int) error {
var errList []string
for i := 0; i <= index; i++ {
if err := terminateProcess(targets[i]); err != nil {
errList = append(errList, err.Error())
continue
}
if err := result.AnnotateChaosResult(resultDetails.Name, chaosNs, "reverted", "pod", targets[i].Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// checkOOMKilled checks if the container within the target pods failed due to an OOMKilled error.
func checkOOMKilled(targets []*targetDetails, clients clients.ClientSets, chaosError error) (bool, error) {
// Check each container in the pod
for i := 0; i < 3; i++ {
for _, t := range targets {
// Fetch the target pod
targetPod, err := clients.KubeClient.CoreV1().Pods(t.Namespace).Get(context.Background(), t.Name, v1.GetOptions{})
if err != nil {
return false, cerrors.Error{
ErrorCode: cerrors.ErrorTypeStatusChecks,
Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace),
Reason: err.Error(),
}
}
for _, c := range targetPod.Status.ContainerStatuses {
if utils.Contains(c.Name, t.TargetContainers) {
// Check for OOMKilled and restart
if c.LastTerminationState.Terminated != nil && c.LastTerminationState.Terminated.ExitCode == 137 {
log.Warnf("[Warning]: The target container '%s' of pod '%s' got OOM Killed, err: %v", c.Name, t.Name, chaosError)
return true, nil
} }
} }
return errors.Errorf("process exited before the actual cleanup, err: %v", err)
}
log.Info("[Info]: Chaos injection completed")
if err := terminateProcess(cmd.Process.Pid); err != nil {
return err
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods); err != nil {
return err
} }
} }
time.Sleep(1 * time.Second)
}
return false, nil
}
// terminateProcess will remove the stress process from the target container after chaos completion
func terminateProcess(t *targetDetails) error {
var errList []string
for i := range t.Cmds {
if t.Cmds[i] != nil && t.Cmds[i].Cmd.Process != nil {
if err := syscall.Kill(-t.Cmds[i].Cmd.Process.Pid, syscall.SIGKILL); err != nil {
if strings.Contains(err.Error(), ProcessAlreadyKilled) || strings.Contains(err.Error(), ProcessAlreadyFinished) {
continue
}
errList = append(errList, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[i]), Reason: fmt.Sprintf("failed to revert chaos: %s", err.Error())}.Error())
continue
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainers[i])
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
} }
return nil return nil
} }
//terminateProcess will remove the stress process from the target container after chaos completion // prepareStressor will set the required stressors for the given experiment
func terminateProcess(pid int) error {
if err := syscall.Kill(-pid, syscall.SIGKILL); err != nil {
if strings.Contains(err.Error(), ProcessAlreadyKilled) || strings.Contains(err.Error(), ProcessAlreadyFinished) {
return nil
}
return err
}
log.Info("[Info]: Stress process removed successfully")
return nil
}
//prepareStressor will set the required stressors for the given experiment
func prepareStressor(experimentDetails *experimentTypes.ExperimentDetails) []string { func prepareStressor(experimentDetails *experimentTypes.ExperimentDetails) []string {
stressArgs := []string{ stressArgs := []string{
@ -281,33 +407,33 @@ func prepareStressor(experimentDetails *experimentTypes.ExperimentDetails) []str
} }
default: default:
log.Fatalf("stressor for %v experiment is not suported", experimentDetails.ExperimentName) log.Fatalf("stressor for %v experiment is not supported", experimentDetails.ExperimentName)
} }
return stressArgs return stressArgs
} }
//pidPath will get the pid path of the container // pidPath will get the pid path of the container
func pidPath(pid int) cgroups.Path { func pidPath(t *targetDetails, index int) cgroups.Path {
processPath := "/proc/" + strconv.Itoa(pid) + "/cgroup" processPath := "/proc/" + strconv.Itoa(t.Pids[index]) + "/cgroup"
paths, err := parseCgroupFile(processPath) paths, err := parseCgroupFile(processPath, t, index)
if err != nil { if err != nil {
return getErrorPath(errors.Wrapf(err, "parse cgroup file %s", processPath)) return getErrorPath(errors.Wrapf(err, "parse cgroup file %s", processPath))
} }
return getExistingPath(paths, pid, "") return getExistingPath(paths, t.Pids[index], "")
} }
//parseCgroupFile will read and verify the cgroup file entry of a container // parseCgroupFile will read and verify the cgroup file entry of a container
func parseCgroupFile(path string) (map[string]string, error) { func parseCgroupFile(path string, t *targetDetails, index int) (map[string]string, error) {
file, err := os.Open(path) file, err := os.Open(path)
if err != nil { if err != nil {
return nil, errors.Errorf("unable to parse cgroup file: %v", err) return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to parse cgroup: %s", err.Error())}
} }
defer file.Close() defer file.Close()
return parseCgroupFromReader(file) return parseCgroupFromReader(file, t, index)
} }
//parseCgroupFromReader will parse the cgroup file from the reader // parseCgroupFromReader will parse the cgroup file from the reader
func parseCgroupFromReader(r io.Reader) (map[string]string, error) { func parseCgroupFromReader(r io.Reader, t *targetDetails, index int) (map[string]string, error) {
var ( var (
cgroups = make(map[string]string) cgroups = make(map[string]string)
s = bufio.NewScanner(r) s = bufio.NewScanner(r)
@ -318,7 +444,7 @@ func parseCgroupFromReader(r io.Reader) (map[string]string, error) {
parts = strings.SplitN(text, ":", 3) parts = strings.SplitN(text, ":", 3)
) )
if len(parts) < 3 { if len(parts) < 3 {
return nil, errors.Errorf("invalid cgroup entry: %q", text) return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("invalid cgroup entry: %q", text)}
} }
for _, subs := range strings.Split(parts[1], ",") { for _, subs := range strings.Split(parts[1], ",") {
if subs != "" { if subs != "" {
@ -327,13 +453,13 @@ func parseCgroupFromReader(r io.Reader) (map[string]string, error) {
} }
} }
if err := s.Err(); err != nil { if err := s.Err(); err != nil {
return nil, errors.Errorf("buffer scanner failed: %v", err) return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("buffer scanner failed: %s", err.Error())}
} }
return cgroups, nil return cgroups, nil
} }
//getExistingPath will be used to get the existing valid cgroup path // getExistingPath will be used to get the existing valid cgroup path
func getExistingPath(paths map[string]string, pid int, suffix string) cgroups.Path { func getExistingPath(paths map[string]string, pid int, suffix string) cgroups.Path {
for n, p := range paths { for n, p := range paths {
dest, err := getCgroupDestination(pid, n) dest, err := getCgroupDestination(pid, n)
@ -363,14 +489,14 @@ func getExistingPath(paths map[string]string, pid int, suffix string) cgroups.Pa
} }
} }
//getErrorPath will give the invalid cgroup path // getErrorPath will give the invalid cgroup path
func getErrorPath(err error) cgroups.Path { func getErrorPath(err error) cgroups.Path {
return func(_ cgroups.Name) (string, error) { return func(_ cgroups.Name) (string, error) {
return "", err return "", err
} }
} }
//getCgroupDestination will validate the subsystem with the mountpath in container mountinfo file. // getCgroupDestination will validate the subsystem with the mountpath in container mountinfo file.
func getCgroupDestination(pid int, subsystem string) (string, error) { func getCgroupDestination(pid int, subsystem string) (string, error) {
mountinfoPath := fmt.Sprintf("/proc/%d/mountinfo", pid) mountinfoPath := fmt.Sprintf("/proc/%d/mountinfo", pid)
file, err := os.Open(mountinfoPath) file, err := os.Open(mountinfoPath)
@ -393,28 +519,25 @@ func getCgroupDestination(pid int, subsystem string) (string, error) {
return "", errors.Errorf("no destination found for %v ", subsystem) return "", errors.Errorf("no destination found for %v ", subsystem)
} }
//findValidCgroup will be used to get a valid cgroup path // findValidCgroup will be used to get a valid cgroup path
func findValidCgroup(path cgroups.Path, target string) (string, error) { func findValidCgroup(path cgroups.Path, t *targetDetails, index int) (string, error) {
for _, subsystem := range cgroupSubsystemList { for _, subsystem := range cgroupSubsystemList {
path, err := path(cgroups.Name(subsystem)) path, err := path(cgroups.Name(subsystem))
if err != nil { if err != nil {
log.Errorf("fail to retrieve the cgroup path, subsystem: %v, target: %v, err: %v", subsystem, target, err) log.Errorf("fail to retrieve the cgroup path, subsystem: %v, target: %v, err: %v", subsystem, t.ContainerIds[index], err)
continue continue
} }
if strings.Contains(path, target) { if strings.Contains(path, t.ContainerIds[index]) {
return path, nil return path, nil
} }
} }
return "", errors.Errorf("never found valid cgroup for %s", target) return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: "could not find valid cgroup"}
} }
//getENV fetches all the env variables from the runner pod // getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) { func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "") experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "") experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30")) experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus") experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "") experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
@ -433,7 +556,7 @@ func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
} }
// abortWatcher continuously watch for the abort signals // abortWatcher continuously watch for the abort signals
func abortWatcher(targetPID int, resultName, chaosNS, targetPodName string) { func abortWatcher(targets []*targetDetails, resultName, chaosNS string) {
<-abort <-abort
@ -442,53 +565,133 @@ func abortWatcher(targetPID int, resultName, chaosNS, targetPodName string) {
// retry thrice for the chaos revert // retry thrice for the chaos revert
retry := 3 retry := 3
for retry > 0 { for retry > 0 {
if err = terminateProcess(targetPID); err != nil { for _, t := range targets {
log.Errorf("unable to kill stress process, err :%v", err) if err = terminateProcess(t); err != nil {
log.Errorf("[Abort]: unable to revert for %v pod, err :%v", t.Name, err)
continue
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("[Abort]: Unable to annotate the chaosresult for %v pod, err :%v", t.Name, err)
}
} }
retry-- retry--
time.Sleep(1 * time.Second) time.Sleep(1 * time.Second)
} }
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", targetPodName); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
log.Info("[Abort]: Chaos Revert Completed") log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1) os.Exit(1)
} }
// getCGroupManager will return the cgroup for the given pid of the process // getCGroupManager will return the cgroup for the given pid of the process
func getCGroupManager(pid int, containerID string) (interface{}, error) { func getCGroupManager(t *targetDetails, index int) (interface{}, error, string) {
if cgroups.Mode() == cgroups.Unified { if cgroups.Mode() == cgroups.Unified {
groupPath, err := cgroupsv2.PidGroupPath(pid) groupPath := ""
output, err := exec.Command("bash", "-c", fmt.Sprintf("nsenter -t 1 -C -m -- cat /proc/%v/cgroup", t.Pids[index])).CombinedOutput()
if err != nil { if err != nil {
return nil, errors.Errorf("Error in getting groupPath, %v", err) return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to get the cgroup: %s :%v", err.Error(), output)}, ""
} }
log.Infof("cgroup output: %s", string(output))
parts := strings.Split(string(output), ":")
if len(parts) < 3 {
return "", fmt.Errorf("invalid cgroup entry: %s", string(output)), ""
}
if strings.HasSuffix(parts[len(parts)-3], "0") && parts[len(parts)-2] == "" {
groupPath = parts[len(parts)-1]
}
log.Infof("group path: %s", groupPath)
cgroup2, err := cgroupsv2.LoadManager("/sys/fs/cgroup", groupPath) cgroup2, err := cgroupsv2.LoadManager("/sys/fs/cgroup", string(groupPath))
if err != nil { if err != nil {
return nil, errors.Errorf("Error loading cgroup v2 manager, %v", err) return nil, errors.Errorf("Error loading cgroup v2 manager, %v", err), ""
} }
return cgroup2, nil return cgroup2, nil, groupPath
} }
path := pidPath(pid) path := pidPath(t, index)
cgroup, err := findValidCgroup(path, containerID) cgroup, err := findValidCgroup(path, t, index)
if err != nil { if err != nil {
return nil, errors.Errorf("fail to get cgroup, err: %v", err) return nil, stacktrace.Propagate(err, "could not find valid cgroup"), ""
} }
cgroup1, err := cgroups.Load(cgroups.V1, cgroups.StaticPath(cgroup)) cgroup1, err := cgroups.Load(cgroups.V1, cgroups.StaticPath(cgroup))
if err != nil { if err != nil {
return nil, errors.Errorf("fail to load the cgroup, err: %v", err) return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to load the cgroup: %s", err.Error())}, ""
} }
return cgroup1, nil return cgroup1, nil, ""
} }
// addProcessToCgroup will add the process to cgroup // addProcessToCgroup will add the process to cgroup
// By default it will add to v1 cgroup // By default it will add to v1 cgroup
func addProcessToCgroup(pid int, control interface{}) error { func addProcessToCgroup(pid int, control interface{}, groupPath string) error {
if cgroups.Mode() == cgroups.Unified { if cgroups.Mode() == cgroups.Unified {
var cgroup1 = control.(*cgroupsv2.Manager) args := []string{"-t", "1", "-C", "--", "sudo", "sh", "-c", fmt.Sprintf("echo %d >> /sys/fs/cgroup%s/cgroup.procs", pid, strings.ReplaceAll(groupPath, "\n", ""))}
return cgroup1.AddProc(uint64(pid)) output, err := exec.Command("nsenter", args...).CombinedOutput()
if err != nil {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: fmt.Sprintf("failed to add process to cgroup %s: %v", string(output), err),
}
}
return nil
} }
var cgroup1 = control.(cgroups.Cgroup) var cgroup1 = control.(cgroups.Cgroup)
return cgroup1.Add(cgroups.Process{Pid: pid}) return cgroup1.Add(cgroups.Process{Pid: pid})
} }
func injectChaos(t *targetDetails, stressors string, index int, stressType string) (*Command, error) {
stressCommand := fmt.Sprintf("pause nsutil -t %v -p -- %v", strconv.Itoa(t.Pids[index]), stressors)
// for io stress,we need to enter into mount ns of the target container
// enabling it by passing -m flag
if stressType == "pod-io-stress" {
stressCommand = fmt.Sprintf("pause nsutil -t %v -p -m -- %v", strconv.Itoa(t.Pids[index]), stressors)
}
log.Infof("[Info]: starting process: %v", stressCommand)
// launch the stress-ng process on the target container in paused mode
cmd := exec.Command("/bin/bash", "-c", stressCommand)
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
var buf bytes.Buffer
cmd.Stdout = &buf
cmd.Stderr = &buf
err = cmd.Start()
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("failed to start stress process: %s", err.Error())}
}
// add the stress process to the cgroup of target container
if err = addProcessToCgroup(cmd.Process.Pid, t.CGroupManagers[index], t.GroupPath); err != nil {
if killErr := cmd.Process.Kill(); killErr != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to add the stress process to cgroup %s and kill stress process: %s", err.Error(), killErr.Error())}
}
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to add the stress process to cgroup: %s", err.Error())}
}
log.Info("[Info]: Sending signal to resume the stress process")
// wait for the process to start before sending the resume signal
// TODO: need a dynamic way to check the start of the process
time.Sleep(700 * time.Millisecond)
// remove pause and resume or start the stress process
if err := cmd.Process.Signal(syscall.SIGCONT); err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to remove pause and start the stress process: %s", err.Error())}
}
return &Command{
Cmd: cmd,
Buffer: buf,
}, nil
}
type targetDetails struct {
Name string
Namespace string
TargetContainers []string
ContainerIds []string
Pids []int
CGroupManagers []interface{}
Cmds []*Command
Source string
GroupPath string
}
type Command struct {
Cmd *exec.Cmd
Buffer bytes.Buffer
}

View File

@ -2,29 +2,34 @@ package lib
import ( import (
"context" "context"
"fmt"
"os"
"strconv" "strconv"
"strings" "strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1" apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
) )
//PrepareAndInjectStressChaos contains the prepration & injection steps for the stress experiments. // PrepareAndInjectStressChaos contains the prepration & injection steps for the stress experiments.
func PrepareAndInjectStressChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareAndInjectStressChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodStressFault")
targetPodList := apiv1.PodList{} defer span.End()
var err error var err error
var podsAffectedPerc int //Set up the tunables if provided in range
//Setup the tunables if provided in range
SetChaosTunables(experimentsDetails) SetChaosTunables(experimentsDetails)
switch experimentsDetails.StressType { switch experimentsDetails.StressType {
@ -56,36 +61,14 @@ func PrepareAndInjectStressChaos(experimentsDetails *experimentTypes.ExperimentD
// Get the target pod details for the chaos execution // Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage // if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" { if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return errors.Errorf("Please provide one of the appLabel or TARGET_PODS") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
} }
podsAffectedPerc, _ = strconv.Atoi(experimentsDetails.PodsAffectedPerc) targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if experimentsDetails.NodeLabel == "" { if err != nil {
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails) return stacktrace.Propagate(err, "could not get target pods")
if err != nil {
return err
}
} else {
if experimentsDetails.TargetPods == "" {
targetPodList, err = common.GetPodListFromSpecifiedNodes(experimentsDetails.TargetPods, podsAffectedPerc, experimentsDetails.NodeLabel, clients, chaosDetails)
if err != nil {
return err
}
} else {
log.Infof("TARGET_PODS env is provided, overriding the NODE_LABEL input")
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
}
} }
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("[Info]: Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection //Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime) log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
@ -96,41 +79,41 @@ func PrepareAndInjectStressChaos(experimentsDetails *experimentTypes.ExperimentD
if experimentsDetails.ChaosServiceAccount == "" { if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients) experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil { if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err) return stacktrace.Propagate(err, "could not get experiment service account")
} }
} }
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil { if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err return stacktrace.Propagate(err, "could not set helper data")
} }
} }
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "") experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil { if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil { if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
return nil return nil
} }
// injectChaosInSerialMode inject the stress chaos in all target application serially (one by one) // injectChaosInSerialMode inject the stress chaos in all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodStressFaultInSerialMode")
defer span.End()
labelSuffix := common.GetRunID()
var err error
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -140,10 +123,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get the target container name of the application pod //Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided { if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{ log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
@ -151,115 +131,69 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"NodeName": pod.Spec.NodeName, "NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer, "ContainerName": experimentsDetails.TargetContainer,
}) })
runID := common.GetRunID() runID := stringutils.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil { if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err) return stacktrace.Propagate(err, "could not create helper pod")
} }
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
log.Info("[Status]: Checking the status of the helper pods") return err
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for stress chaos
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("unable to delete the helper pods, err: %v", err)
} }
} }
return nil return nil
} }
// injectChaosInParallelMode inject the stress chaos in all target application in parallel mode (all at once) // injectChaosInParallelMode inject the stress chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodStressFaultInParallelMode")
defer span.End()
labelSuffix := common.GetRunID()
var err error
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
// creating the helper pod to perform stress chaos runID := stringutils.GetRunID()
for _, pod := range targetPodList.Items { targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
//Get the target container name of the application pod for node, tar := range targets {
if !experimentsDetails.IsTargetContainerProvided { var targetsPerNode []string
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients) for _, k := range tar.Target {
if err != nil { targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
return errors.Errorf("unable to get the target container name, err: %v", err)
}
} }
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{ if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
"PodName": pod.Name, return stacktrace.Propagate(err, "could not create helper pod")
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix)
if err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
} }
} }
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
log.Info("[Status]: Checking the status of the helper pods") return err
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for stress chaos
log.Info("[Cleanup]: Deleting all the helper pod")
err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("unable to delete the helper pods, err: %v", err)
} }
return nil return nil
} }
// createHelperPod derive the attributes for helper pod and create the helper pod // createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, podName, nodeName, runID, labelSuffix string) error { func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreatePodStressFaultHelperPod")
defer span.End()
privilegedEnable := true privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds) terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{ helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{ ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID, GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace, Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName), Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations, Annotations: chaosDetails.Annotations,
}, },
Spec: apiv1.PodSpec{ Spec: apiv1.PodSpec{
HostPID: true, HostPID: true,
@ -301,7 +235,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name stress-chaos", "./helpers -name stress-chaos",
}, },
Resources: chaosDetails.Resources, Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, podName), Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{ VolumeMounts: []apiv1.VolumeMount{
{ {
Name: "socket-path", Name: "socket-path",
@ -326,18 +260,23 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
}, },
} }
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{}) if len(chaosDetails.SideCar) != 0 {
return err helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
} }
// getPodEnv derive all the env required for the helper pod // getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName string) []apiv1.EnvVar { func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS). envDetails.SetEnv("TARGETS", targets).
SetEnv("APP_POD", podName).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)). SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace). SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName). SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
@ -354,6 +293,8 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName st
SetEnv("VOLUME_MOUNT_PATH", experimentsDetails.VolumeMountPath). SetEnv("VOLUME_MOUNT_PATH", experimentsDetails.VolumeMountPath).
SetEnv("STRESS_TYPE", experimentsDetails.StressType). SetEnv("STRESS_TYPE", experimentsDetails.StressType).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID). SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name") SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV return envDetails.ENV
@ -363,8 +304,8 @@ func ptrint64(p int64) *int64 {
return &p return &p
} }
//SetChaosTunables will setup a random value within a given range of values // SetChaosTunables will set up a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value. // If the value is not provided in range it'll set up the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) { func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.CPUcores = common.ValidateRange(experimentsDetails.CPUcores) experimentsDetails.CPUcores = common.ValidateRange(experimentsDetails.CPUcores)
experimentsDetails.CPULoad = common.ValidateRange(experimentsDetails.CPULoad) experimentsDetails.CPULoad = common.ValidateRange(experimentsDetails.CPULoad)

View File

@ -1,28 +1,34 @@
package lib package lib
import ( import (
"context"
"fmt"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
"syscall" "syscall"
"time" "time"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/vmware" "github.com/litmuschaos/litmus-go/pkg/cloud/vmware"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log" "github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe" "github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/vmware/vm-poweroff/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/vmware/vm-poweroff/types"
"github.com/pkg/errors" "github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
) )
var inject, abort chan os.Signal var inject, abort chan os.Signal
// InjectVMPowerOffChaos injects the chaos in serial or parallel mode // InjectVMPowerOffChaos injects the chaos in serial or parallel mode
func InjectVMPowerOffChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, cookie string) error { func InjectVMPowerOffChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, cookie string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareVMPowerOffFault")
defer span.End()
// inject channel is used to transmit signal notifications. // inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel. // Catch and relay certain signal(s) to inject channel.
@ -47,15 +53,15 @@ func InjectVMPowerOffChaos(experimentsDetails *experimentTypes.ExperimentDetails
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err := injectChaosInSerialMode(experimentsDetails, vmIdList, cookie, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err := injectChaosInSerialMode(ctx, experimentsDetails, vmIdList, cookie, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err := injectChaosInParallelMode(experimentsDetails, vmIdList, cookie, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err := injectChaosInParallelMode(ctx, experimentsDetails, vmIdList, cookie, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -68,7 +74,10 @@ func InjectVMPowerOffChaos(experimentsDetails *experimentTypes.ExperimentDetails
} }
// injectChaosInSerialMode stops VMs in serial mode i.e. one after the other // injectChaosInSerialMode stops VMs in serial mode i.e. one after the other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, cookie string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, cookie string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "injectVMPowerOffFaultInSerialMode")
defer span.End()
select { select {
case <-inject: case <-inject:
// stopping the chaos execution, if abort signal received // stopping the chaos execution, if abort signal received
@ -93,7 +102,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Stopping the VM //Stopping the VM
log.Infof("[Chaos]: Stopping %s VM", vmId) log.Infof("[Chaos]: Stopping %s VM", vmId)
if err := vmware.StopVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil { if err := vmware.StopVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("failed to stop %s vm: %s", vmId, err.Error()) return stacktrace.Propagate(err, fmt.Sprintf("failed to stop %s vm", vmId))
} }
common.SetTargets(vmId, "injected", "VM", chaosDetails) common.SetTargets(vmId, "injected", "VM", chaosDetails)
@ -101,14 +110,14 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Wait for the VM to completely stop //Wait for the VM to completely stop
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_OFF state", vmId) log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_OFF state", vmId)
if err := vmware.WaitForVMStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil { if err := vmware.WaitForVMStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("vm %s failed to successfully shutdown, err: %s", vmId, err.Error()) return stacktrace.Propagate(err, "VM shutdown failed")
} }
//Run the probes during the chaos //Run the probes during the chaos
//The OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration //The OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 { if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "failed to run probes")
} }
} }
@ -119,13 +128,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Starting the VM //Starting the VM
log.Infof("[Chaos]: Starting back %s VM", vmId) log.Infof("[Chaos]: Starting back %s VM", vmId)
if err := vmware.StartVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil { if err := vmware.StartVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("failed to start back %s vm: %s", vmId, err.Error()) return stacktrace.Propagate(err, "failed to start back vm")
} }
//Wait for the VM to completely start //Wait for the VM to completely start
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_ON state", vmId) log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_ON state", vmId)
if err := vmware.WaitForVMStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil { if err := vmware.WaitForVMStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("vm %s failed to successfully start, err: %s", vmId, err.Error()) return stacktrace.Propagate(err, "vm failed to start")
} }
common.SetTargets(vmId, "reverted", "VM", chaosDetails) common.SetTargets(vmId, "reverted", "VM", chaosDetails)
@ -139,7 +148,9 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
// injectChaosInParallelMode stops VMs in parallel mode i.e. all at once // injectChaosInParallelMode stops VMs in parallel mode i.e. all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, cookie string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, cookie string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "injectVMPowerOffFaultInParallelMode")
defer span.End()
select { select {
case <-inject: case <-inject:
@ -165,7 +176,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Stopping the VM //Stopping the VM
log.Infof("[Chaos]: Stopping %s VM", vmId) log.Infof("[Chaos]: Stopping %s VM", vmId)
if err := vmware.StopVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil { if err := vmware.StopVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("failed to stop %s vm: %s", vmId, err.Error()) return stacktrace.Propagate(err, fmt.Sprintf("failed to stop %s vm", vmId))
} }
common.SetTargets(vmId, "injected", "VM", chaosDetails) common.SetTargets(vmId, "injected", "VM", chaosDetails)
@ -176,14 +187,14 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for the VM to completely stop //Wait for the VM to completely stop
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_OFF state", vmId) log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_OFF state", vmId)
if err := vmware.WaitForVMStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil { if err := vmware.WaitForVMStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("vm %s failed to successfully shutdown, err: %s", vmId, err.Error()) return stacktrace.Propagate(err, "vm failed to shutdown")
} }
} }
//Running the probes during chaos //Running the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return stacktrace.Propagate(err, "failed to run probes")
} }
} }
@ -196,7 +207,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Starting the VM //Starting the VM
log.Infof("[Chaos]: Starting back %s VM", vmId) log.Infof("[Chaos]: Starting back %s VM", vmId)
if err := vmware.StartVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil { if err := vmware.StartVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("failed to start back %s vm: %s", vmId, err.Error()) return stacktrace.Propagate(err, fmt.Sprintf("failed to start back %s vm", vmId))
} }
} }
@ -205,7 +216,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for the VM to completely start //Wait for the VM to completely start
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_ON state", vmId) log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_ON state", vmId)
if err := vmware.WaitForVMStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil { if err := vmware.WaitForVMStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("vm %s failed to successfully start, err: %s", vmId, err.Error()) return stacktrace.Propagate(err, "vm failed to successfully start")
} }
} }

View File

@ -1,272 +0,0 @@
package lib
import (
"context"
"strconv"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-delete/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
appsv1 "k8s.io/api/apps/v1"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.ChaosServiceAccount == "" {
// Getting the serviceAccountName for the powerfulseal pod
err := GetServiceAccount(experimentsDetails, clients)
if err != nil {
return errors.Errorf("Unable to get the serviceAccountName, err: %v", err)
}
}
// generating a unique string which can be appended with the powerfulseal deployment name & labels for the uniquely identification
runID := common.GetRunID()
// generating the chaos inject event in the chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// Creating configmap for powerfulseal deployment
err := CreateConfigMap(experimentsDetails, clients, runID)
if err != nil {
return err
}
// Creating powerfulseal deployment
err = CreatePowerfulsealDeployment(experimentsDetails, clients, runID)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
//checking the status of the powerfulseal pod, wait till the powerfulseal pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "name=powerfulseal-"+runID, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("powerfulseal pod is not in running state, err: %v", err)
}
// Wait for Chaos Duration
log.Infof("[Wait]: Waiting for the %vs chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
//Deleting the powerfulseal deployment
log.Info("[Cleanup]: Deleting the powerfulseal deployment")
err = DeletePowerfulsealDeployment(experimentsDetails, clients, runID)
if err != nil {
return errors.Errorf("Unable to delete the powerfulseal deployment, err: %v", err)
}
//Deleting the powerfulseal configmap
log.Info("[Cleanup]: Deleting the powerfulseal configmap")
err = DeletePowerfulsealConfigmap(experimentsDetails, clients, runID)
if err != nil {
return errors.Errorf("Unable to delete the powerfulseal configmap, err: %v", err)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// GetServiceAccount find the serviceAccountName for the powerfulseal deployment
func GetServiceAccount(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Get(context.Background(), experimentsDetails.ChaosPodName, v1.GetOptions{})
if err != nil {
return err
}
experimentsDetails.ChaosServiceAccount = pod.Spec.ServiceAccountName
return nil
}
// CreateConfigMap creates a configmap for the powerfulseal deployment
func CreateConfigMap(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
data := map[string]string{}
// It will store all the details inside a string in well formated way
policy := GetConfigMapData(experimentsDetails)
data["policy"] = policy
configMap := &apiv1.ConfigMap{
ObjectMeta: v1.ObjectMeta{
Name: "policy-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"name": "policy-" + runID,
},
},
Data: data,
}
_, err := clients.KubeClient.CoreV1().ConfigMaps(experimentsDetails.ChaosNamespace).Create(context.Background(), configMap, v1.CreateOptions{})
return err
}
// GetConfigMapData generates the configmap data for the powerfulseal deployments in desired format format
func GetConfigMapData(experimentsDetails *experimentTypes.ExperimentDetails) string {
waitTime, _ := strconv.Atoi(experimentsDetails.ChaosInterval)
policy := "config:" + "\n" +
" minSecondsBetweenRuns: 1" + "\n" +
" maxSecondsBetweenRuns: " + strconv.Itoa(waitTime) + "\n" +
"podScenarios:" + "\n" +
" - name: \"delete random pods in application namespace\"" + "\n" +
" match:" + "\n" +
" - labels:" + "\n" +
" namespace: " + experimentsDetails.AppNS + "\n" +
" selector: " + experimentsDetails.AppLabel + "\n" +
" filters:" + "\n" +
" - randomSample:" + "\n" +
" size: 1" + "\n" +
" actions:" + "\n" +
" - kill:" + "\n" +
" probability: 0.77" + "\n" +
" force: " + strconv.FormatBool(experimentsDetails.Force)
return policy
}
// CreatePowerfulsealDeployment derive the attributes for powerfulseal deployment and create it
func CreatePowerfulsealDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
deployment := &appsv1.Deployment{
ObjectMeta: v1.ObjectMeta{
Name: "powerfulseal-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": "powerfulseal",
"name": "powerfulseal-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
},
},
Spec: appsv1.DeploymentSpec{
Selector: &v1.LabelSelector{
MatchLabels: map[string]string{
"name": "powerfulseal-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
},
Replicas: func(i int32) *int32 { return &i }(1),
Template: apiv1.PodTemplateSpec{
ObjectMeta: v1.ObjectMeta{
Labels: map[string]string{
"name": "powerfulseal-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
},
Spec: apiv1.PodSpec{
Volumes: []apiv1.Volume{
{
Name: "policyfile",
VolumeSource: apiv1.VolumeSource{
ConfigMap: &apiv1.ConfigMapVolumeSource{
LocalObjectReference: apiv1.LocalObjectReference{
Name: "policy-" + runID,
},
},
},
},
},
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
TerminationGracePeriodSeconds: func(i int64) *int64 { return &i }(0),
Containers: []apiv1.Container{
{
Name: "powerfulseal",
Image: "ksatchit/miko-powerfulseal:non-ssh",
Args: []string{
"autonomous",
"--inventory-kubernetes",
"--no-cloud",
"--policy-file=/root/policy_kill_random_default.yml",
"--use-pod-delete-instead-of-ssh-kill",
},
VolumeMounts: []apiv1.VolumeMount{
{
Name: "policyfile",
MountPath: "/root/policy_kill_random_default.yml",
SubPath: "policy",
},
},
},
},
},
},
},
}
_, err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaosNamespace).Create(context.Background(), deployment, v1.CreateOptions{})
return err
}
//DeletePowerfulsealDeployment delete the powerfulseal deployment
func DeletePowerfulsealDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaosNamespace).Delete(context.Background(), "powerfulseal-"+runID, v1.DeleteOptions{})
if err != nil {
return err
}
err = retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
podSpec, err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaosNamespace).List(context.Background(), v1.ListOptions{LabelSelector: "name=powerfulseal-" + runID})
if err != nil || len(podSpec.Items) != 0 {
return errors.Errorf("Deployment is not deleted yet, err: %v", err)
}
return nil
})
return err
}
//DeletePowerfulsealConfigmap delete the powerfulseal configmap
func DeletePowerfulsealConfigmap(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
err := clients.KubeClient.CoreV1().ConfigMaps(experimentsDetails.ChaosNamespace).Delete(context.Background(), "policy-"+runID, v1.DeleteOptions{})
if err != nil {
return err
}
err = retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
podSpec, err := clients.KubeClient.CoreV1().ConfigMaps(experimentsDetails.ChaosNamespace).List(context.Background(), v1.ListOptions{LabelSelector: "name=policy-" + runID})
if err != nil || len(podSpec.Items) != 0 {
return errors.Errorf("configmap is not deleted yet, err: %v", err)
}
return nil
})
return err
}

View File

@ -1,364 +0,0 @@
package lib
import (
"context"
"strconv"
"strings"
"time"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/container-kill/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareContainerKill contains the prepration steps before chaos injection
func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
//Setup the tunables if provided in range
litmusLIB.SetChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The tunables are:", logrus.Fields{
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
}
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "")
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode kill the container of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
//GetRestartCount return the restart count of target container
restartCountBefore := getRestartCount(pod, experimentsDetails.TargetContainer)
log.Infof("restartCount of target container before chaos injection: %v", restartCountBefore)
runID := common.GetRunID()
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"Target Container": experimentsDetails.TargetContainer,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
log.Infof("[Wait]: Waiting for the %vs chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
// It will verify that the restart count of container should increase after chaos injection
if err := verifyRestartCount(experimentsDetails, pod, clients, restartCountBefore); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("target container is not restarted, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
var err error
//GetRestartCount return the restart count of target container
restartCountBefore := getRestartCountAll(targetPodList, experimentsDetails.TargetContainer)
log.Infof("restartCount of target containers before chaos injection: %v", restartCountBefore)
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"Target Container": experimentsDetails.TargetContainer,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
log.Infof("[Wait]: Waiting for the %vs chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
// It will verify that the restart count of container should increase after chaos injection
if err := verifyRestartCountAll(experimentsDetails, targetPodList, clients, restartCountBefore); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("target container is not restarted , err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
return nil
}
//getRestartCount return the restart count of target container
func getRestartCount(targetPod apiv1.Pod, containerName string) int {
restartCount := 0
for _, container := range targetPod.Status.ContainerStatuses {
if container.Name == containerName {
restartCount = int(container.RestartCount)
break
}
}
return restartCount
}
//getRestartCountAll return the restart count of all target container
func getRestartCountAll(targetPodList apiv1.PodList, containerName string) []int {
restartCount := []int{}
for _, pod := range targetPodList.Items {
restartCount = append(restartCount, getRestartCount(pod, containerName))
}
return restartCount
}
//verifyRestartCount verify the restart count of target container that it is restarted or not after chaos injection
// the restart count of container should increase after chaos injection
func verifyRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, pod apiv1.Pod, clients clients.ClientSets, restartCountBefore int) error {
restartCountAfter := 0
err := retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(context.Background(), pod.Name, v1.GetOptions{})
if err != nil {
return err
}
for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentsDetails.TargetContainer {
restartCountAfter = int(container.RestartCount)
break
}
}
return nil
})
if err != nil {
return err
}
// it will fail if restart count won't increase
if restartCountAfter <= restartCountBefore {
return errors.Errorf("target container is not restarted")
}
log.Infof("restartCount of target container after chaos injection: %v", restartCountAfter)
return nil
}
//verifyRestartCountAll verify the restart count of all the target container that it is restarted or not after chaos injection
// the restart count of container should increase after chaos injection
func verifyRestartCountAll(experimentsDetails *experimentTypes.ExperimentDetails, podList apiv1.PodList, clients clients.ClientSets, restartCountBefore []int) error {
for index, pod := range podList.Items {
if err := verifyRestartCount(experimentsDetails, pod, clients, restartCountBefore[index]); err != nil {
return err
}
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error {
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"sudo",
"-E",
},
Args: []string{
"pumba",
"--random",
"--interval",
strconv.Itoa(experimentsDetails.ChaosInterval) + "s",
"kill",
"--signal",
experimentsDetails.Signal,
"re2:k8s_" + experimentsDetails.TargetContainer + "_" + appName,
},
Env: []apiv1.EnvVar{
{
Name: "DOCKER_HOST",
Value: "unix://" + experimentsDetails.SocketPath,
},
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
}

View File

@ -1,280 +0,0 @@
package lib
import (
"context"
"strconv"
"strings"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/stress-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PreparePodCPUHog contains prepration steps before chaos injection
func PreparePodCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//setup the tunables if provided in range
litmusLIB.SetChaosTunables(experimentsDetails)
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stress the cpu of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform cpu chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"CPUcores": experimentsDetails.CPUcores,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform cpu chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"CPUcores": experimentsDetails.CPUcores,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error {
helperPod := &apiv1.Pod{
TypeMeta: v1.TypeMeta{
Kind: "Pod",
APIVersion: "v1",
},
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: "pumba-stress",
Image: experimentsDetails.LIBImage,
Command: []string{
"sudo",
"-E",
},
Args: getContainerArguments(experimentsDetails, appName),
Env: []apiv1.EnvVar{
{
Name: "DOCKER_HOST",
Value: "unix://" + experimentsDetails.SocketPath,
},
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
SecurityContext: &apiv1.SecurityContext{
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
}
// getContainerArguments derives the args for the pumba stress helper pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails, appName string) []string {
stressArgs := []string{
"pumba",
"--log-level",
"debug",
"--label",
"io.kubernetes.pod.name=" + appName,
"stress",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--stress-image",
experimentsDetails.StressImage,
"--stressors",
"--cpu " + experimentsDetails.CPUcores + " --timeout " + strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
return stressArgs
}

View File

@ -1,281 +0,0 @@
package lib
import (
"context"
"strconv"
"strings"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/stress-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PreparePodMemoryHog contains prepration steps before chaos injection
func PreparePodMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//setup the tunables if provided in range
litmusLIB.SetChaosTunables(experimentsDetails)
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stress the cpu of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform memory chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"MemoryBytes": experimentsDetails.MemoryConsumption,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform memory chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"MemoryBytes": experimentsDetails.MemoryConsumption,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error {
helperPod := &apiv1.Pod{
TypeMeta: v1.TypeMeta{
Kind: "Pod",
APIVersion: "v1",
},
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: "pumba-stress",
Image: experimentsDetails.LIBImage,
Command: []string{
"sudo",
"-E",
},
Args: getContainerArguments(experimentsDetails, appName),
Env: []apiv1.EnvVar{
{
Name: "DOCKER_HOST",
Value: "unix://" + experimentsDetails.SocketPath,
},
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
SecurityContext: &apiv1.SecurityContext{
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
}
// getContainerArguments derives the args for the pumba stress helper pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails, appName string) []string {
stressArgs := []string{
"pumba",
"--log-level",
"debug",
"--label",
"io.kubernetes.pod.name=" + appName,
"stress",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--stress-image",
experimentsDetails.StressImage,
"--stressors",
"--cpu 1 --vm 1 --vm-bytes " + experimentsDetails.MemoryConsumption + "M --timeout " + strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
return stressArgs
}

View File

@ -1,43 +0,0 @@
package corruption
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
//PodNetworkCorruptionChaos contains the steps to prepare and inject chaos
func PodNetworkCorruptionChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args, err := getContainerArguments(experimentsDetails)
if err != nil {
return err
}
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// getContainerArguments derives the args for the pumba pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) ([]string, error) {
baseArgs := []string{
"pumba",
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args, err := network_chaos.AddTargetIpsArgs(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, args)
if err != nil {
return args, err
}
args = append(args, "corrupt", "--percent", experimentsDetails.NetworkPacketCorruptionPercentage)
return args, nil
}

View File

@ -1,43 +0,0 @@
package duplication
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
//PodNetworkDuplicationChaos contains the steps to prepare and inject chaos
func PodNetworkDuplicationChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args, err := getContainerArguments(experimentsDetails)
if err != nil {
return err
}
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// getContainerArguments derives the args for the pumba pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) ([]string, error) {
baseArgs := []string{
"pumba",
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args, err := network_chaos.AddTargetIpsArgs(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, args)
if err != nil {
return args, err
}
args = append(args, "duplicate", "--percent", experimentsDetails.NetworkPacketDuplicationPercentage)
return args, nil
}

View File

@ -1,43 +0,0 @@
package latency
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
//PodNetworkLatencyChaos contains the steps to prepare and inject chaos
func PodNetworkLatencyChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args, err := getContainerArguments(experimentsDetails)
if err != nil {
return err
}
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// getContainerArguments derives the args for the pumba pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) ([]string, error) {
baseArgs := []string{
"pumba",
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args, err := network_chaos.AddTargetIpsArgs(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, args)
if err != nil {
return args, err
}
args = append(args, "delay", "--time", strconv.Itoa(experimentsDetails.NetworkLatency))
return args, nil
}

View File

@ -1,43 +0,0 @@
package loss
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
//PodNetworkLossChaos contains the steps to prepare and inject chaos
func PodNetworkLossChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args, err := getContainerArguments(experimentsDetails)
if err != nil {
return err
}
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// getContainerArguments derives the args for the pumba pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) ([]string, error) {
baseArgs := []string{
"pumba",
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args, err := network_chaos.AddTargetIpsArgs(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, args)
if err != nil {
return args, err
}
args = append(args, "loss", "--percent", experimentsDetails.NetworkPacketLossPercentage)
return args, nil
}

View File

@ -1,305 +0,0 @@
package lib
import (
"context"
"strconv"
"strings"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareAndInjectChaos contains the prepration and chaos injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args []string) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
//setup the tunables if provided in range
litmusLIB.SetChaosTunables(experimentsDetails)
switch experimentsDetails.NetworkChaosType {
case "network-loss":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketLossPercentage": experimentsDetails.NetworkPacketLossPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "network-latency":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkLatency": strconv.Itoa(experimentsDetails.NetworkLatency),
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "network-corruption":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketCorruptionPercentage": experimentsDetails.NetworkPacketCorruptionPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "network-duplication":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketDuplicationPercentage": experimentsDetails.NetworkPacketDuplicationPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
default:
return errors.Errorf("invalid experiment, please check the environment.go")
}
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
}
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return err
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return err
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stress the cpu of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args []string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
})
// args contains details of the specific chaos injection
// constructing `argsWithRegex` based on updated regex with a diff pod name
// without extending/concatenating the args var itself
argsWithRegex := append(args, "re2:k8s_POD_"+pod.Name+"_"+experimentsDetails.AppNS)
log.Infof("Arguments for running %v are %v", experimentsDetails.ExperimentName, argsWithRegex)
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Spec.NodeName, runID, argsWithRegex, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, chaosDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args []string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
})
// args contains details of the specific chaos injection
// constructing `argsWithRegex` based on updated regex with a diff pod name
// without extending/concatenating the args var itself
argsWithRegex := append(args, "re2:k8s_POD_"+pod.Name+"_"+experimentsDetails.AppNS)
log.Infof("Arguments for running %v are %v", experimentsDetails.ExperimentName, argsWithRegex)
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Spec.NodeName, runID, argsWithRegex, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, chaosDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName, runID string, args []string, labelSuffix string) error {
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"sudo",
"-E",
},
Args: args,
Env: []apiv1.EnvVar{
{
Name: "DOCKER_HOST",
Value: "unix://" + experimentsDetails.SocketPath,
},
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
}
// AddTargetIpsArgs inserts a comma-separated list of targetIPs (if provided by the user) into the pumba command/args
func AddTargetIpsArgs(targetIPs, targetHosts string, args []string) ([]string, error) {
targetIPs, err := network_chaos.GetTargetIps(targetIPs, targetHosts, clients.ClientSets{}, false)
if err != nil {
return nil, err
}
if targetIPs == "" {
return args, nil
}
ips := strings.Split(targetIPs, ",")
for i := range ips {
args = append(args, "--target", strings.TrimSpace(ips[i]))
}
return args, nil
}

View File

@ -1,306 +0,0 @@
package lib
import (
"context"
"strconv"
"strings"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/stress-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PreparePodIOStress contains prepration steps before chaos injection
func PreparePodIOStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//setup the tunables if provided in range
litmusLIB.SetChaosTunables(experimentsDetails)
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stress the cpu of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"FilesystemUtilizationBytes": experimentsDetails.FilesystemUtilizationBytes,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"FilesystemUtilizationBytes": experimentsDetails.FilesystemUtilizationBytes,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error {
helperPod := &apiv1.Pod{
TypeMeta: v1.TypeMeta{
Kind: "Pod",
APIVersion: "v1",
},
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: "pumba-stress",
Image: experimentsDetails.LIBImage,
Command: []string{
"sudo",
"-E",
},
Args: getContainerArguments(experimentsDetails, appName),
Env: []apiv1.EnvVar{
{
Name: "DOCKER_HOST",
Value: "unix://" + experimentsDetails.SocketPath,
},
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
SecurityContext: &apiv1.SecurityContext{
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
}
// getContainerArguments derives the args for the pumba stress helper pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails, appName string) []string {
var hddbytes string
if experimentsDetails.FilesystemUtilizationBytes == "0" {
if experimentsDetails.FilesystemUtilizationPercentage == "0" {
hddbytes = "10%"
log.Info("Neither of FilesystemUtilizationPercentage or FilesystemUtilizationBytes provided, proceeding with a default FilesystemUtilizationPercentage value of 10%")
} else {
hddbytes = experimentsDetails.FilesystemUtilizationPercentage + "%"
}
} else {
if experimentsDetails.FilesystemUtilizationPercentage == "0" {
hddbytes = experimentsDetails.FilesystemUtilizationBytes + "G"
} else {
hddbytes = experimentsDetails.FilesystemUtilizationPercentage + "%"
log.Warn("Both FsUtilPercentage & FsUtilBytes provided as inputs, using the FsUtilPercentage value to proceed with stress exp")
}
}
stressArgs := []string{
"pumba",
"--log-level",
"debug",
"--label",
"io.kubernetes.pod.name=" + appName,
"stress",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--stress-image",
experimentsDetails.StressImage,
"--stressors",
}
args := stressArgs
if experimentsDetails.VolumeMountPath == "" {
args = append(args, "--cpu 1 --io "+experimentsDetails.NumberOfWorkers+" --hdd "+experimentsDetails.NumberOfWorkers+" --hdd-bytes "+hddbytes+" --timeout "+strconv.Itoa(experimentsDetails.ChaosDuration)+"s")
} else {
args = append(args, "--cpu 1 --io "+experimentsDetails.NumberOfWorkers+" --hdd "+experimentsDetails.NumberOfWorkers+" --hdd-bytes "+hddbytes+" --temp-path "+experimentsDetails.VolumeMountPath+" --timeout "+strconv.Itoa(experimentsDetails.ChaosDuration)+"s")
}
return args
}

View File

@ -1,7 +1,11 @@
package lib package lib
import ( import (
"context"
"fmt"
"os" "os"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/palantir/stacktrace"
"os/signal" "os/signal"
"syscall" "syscall"
"time" "time"
@ -13,7 +17,6 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec" litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1" corev1 "k8s.io/api/core/v1"
) )
@ -25,18 +28,24 @@ func injectChaos(experimentsDetails *experimentTypes.ExperimentDetails, podName
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, experimentsDetails.AppNS) litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, experimentsDetails.AppNS)
_, err := litmusexec.Exec(&execCommandDetails, clients, command) _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil { if err != nil {
return errors.Errorf("unable to run command inside target container, err: %v", err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to inject chaos: %s", err.Error())}
} }
return nil return nil
} }
func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
// Get the target pod details for the chaos execution // Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage // if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails) targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get target pods")
} }
podNames := []string{} podNames := []string{}
@ -45,23 +54,29 @@ func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails,
} }
log.Infof("Target pods list for chaos, %v", podNames) log.Infof("Target pods list for chaos, %v", podNames)
//Get the target container name of the application pod return runChaos(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails)
if experimentsDetails.TargetContainer == "" { }
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, targetPodList.Items[0].Name, clients)
if err != nil { func runChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
return errors.Errorf("unable to get the target container name, err: %v", err) // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
} }
} }
return runChaos(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails)
}
func runChaos(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
var endTime <-chan time.Time var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
for _, pod := range targetPodList.Items { for _, pod := range targetPodList.Items {
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod" msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails) types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
@ -99,14 +114,17 @@ func runChaos(experimentsDetails *experimentTypes.ExperimentDetails, targetPodLi
} }
} }
if err := killChaos(experimentsDetails, pod.Name, clients); err != nil { if err := killChaos(experimentsDetails, pod.Name, clients); err != nil {
return err return stacktrace.Propagate(err, "could not revert chaos")
} }
} }
return nil return nil
} }
//PrepareChaos contains the preparation steps before chaos injection //PrepareChaos contains the preparation steps before chaos injection
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectChaos")
// defer span.End()
//Waiting for the ramp time before chaos injection //Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
@ -114,8 +132,8 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
common.WaitForDuration(experimentsDetails.RampTime) common.WaitForDuration(experimentsDetails.RampTime)
} }
//Starting the CPU stress experiment //Starting the CPU stress experiment
if err := experimentExecution(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails);err != nil { if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails);err != nil {
return err return stacktrace.Propagate(err, "could not execute experiment")
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
@ -134,7 +152,7 @@ func killChaos(experimentsDetails *experimentTypes.ExperimentDetails, podName st
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, experimentsDetails.AppNS) litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, experimentsDetails.AppNS)
_, err := litmusexec.Exec(&execCommandDetails, clients, command) _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil { if err != nil {
return errors.Errorf("unable to kill the process in %v pod, err: %v", podName, err) return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to revert chaos: %s", err.Error())}
} }
return nil return nil
} }

View File

@ -1,6 +1,9 @@
package lib package lib
import ( import (
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/palantir/stacktrace"
"context" "context"
clients "github.com/litmuschaos/litmus-go/pkg/clients" clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
@ -10,20 +13,26 @@ import (
"github.com/litmuschaos/litmus-go/pkg/status" "github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors" "github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus" "github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1" corev1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1" v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
) )
func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
// Get the target pod details for the chaos execution // Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage // if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails) targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil { if err != nil {
return err return stacktrace.Propagate(err, "could not get target pods")
} }
podNames := []string{} podNames := []string{}
for _, pod := range targetPodList.Items { for _, pod := range targetPodList.Items {
@ -31,51 +40,48 @@ func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails,
} }
log.Infof("Target pods list for chaos, %v", podNames) log.Infof("Target pods list for chaos, %v", podNames)
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
}
// Getting the serviceAccountName, need permission inside helper pod to create the events // Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" { if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients) experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil { if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err) return stacktrace.Propagate(err, "could not get experiment service account")
} }
} }
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil { if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err return stacktrace.Propagate(err, "could not set helper data")
} }
} }
return runChaos(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails) return runChaos(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails)
} }
func runChaos(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func runChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod" msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails) types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine") events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
} }
labelSuffix := common.GetRunID()
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
// creating the helper pod to perform container kill chaos // creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items { for _, pod := range targetPodList.Items {
runID := common.GetRunID() //Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
runID := stringutils.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{ log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name, "Target Pod": pod.Name,
@ -83,34 +89,16 @@ func runChaos(experimentsDetails *experimentTypes.ExperimentDetails, targetPodLi
"Target Container": experimentsDetails.TargetContainer, "Target Container": experimentsDetails.TargetContainer,
}) })
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil { if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err) return stacktrace.Propagate(err, "could not create helper pod")
} }
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails) common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
log.Info("[Status]: Checking the status of the helper pod") return err
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
} }
} }
@ -118,7 +106,10 @@ func runChaos(experimentsDetails *experimentTypes.ExperimentDetails, targetPodLi
} }
//PrepareChaos contains the preparation steps before chaos injection //PrepareChaos contains the preparation steps before chaos injection
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "Prepare[name-your-chaos]Fault")
// defer span.End()
//Waiting for the ramp time before chaos injection //Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
@ -126,8 +117,8 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
common.WaitForDuration(experimentsDetails.RampTime) common.WaitForDuration(experimentsDetails.RampTime)
} }
//Starting the CPU stress experiment //Starting the CPU stress experiment
if err := experimentExecution(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails);err != nil { if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails);err != nil {
return err return stacktrace.Propagate(err, "could not execute chaos")
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 { if experimentsDetails.RampTime != 0 {
@ -138,13 +129,16 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
} }
// createHelperPod derive the attributes for helper pod and create the helper pod // createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error { func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "Create[name-your-chaos]FaultHelperPod")
// defer span.End()
helperPod := &corev1.Pod{ helperPod := &corev1.Pod{
ObjectMeta: v1.ObjectMeta{ ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID, GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace, Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName), Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations, Annotations: chaosDetails.Annotations,
}, },
Spec: corev1.PodSpec{ Spec: corev1.PodSpec{
@ -172,5 +166,8 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
} }
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{}) _, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
} }

View File

@ -1,6 +1,7 @@
package lib package lib
import ( import (
"context"
"os" "os"
"os/signal" "os/signal"
"strings" "strings"
@ -14,7 +15,6 @@ import (
experimentTypes "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/types"
"github.com/litmuschaos/litmus-go/pkg/types" "github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common" "github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
) )
var ( var (
@ -22,8 +22,11 @@ var (
inject, abort chan os.Signal inject, abort chan os.Signal
) )
//PrepareChaos contains the prepration and injection steps for the experiment //PrepareChaos contains the preparation and injection steps for the experiment
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "Prepare[name-your-chaos]Fault")
// defer span.End()
// inject channel is used to transmit signal notifications. // inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1) inject = make(chan os.Signal, 1)
@ -46,7 +49,7 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
// THIS TEMPLATE CONTAINS THE SELECTION BY ID FOR TAG YOU NEED TO ADD/CALL A FUNCTION HERE // THIS TEMPLATE CONTAINS THE SELECTION BY ID FOR TAG YOU NEED TO ADD/CALL A FUNCTION HERE
targetIDList := strings.Split(experimentsDetails.TargetID, ",") targetIDList := strings.Split(experimentsDetails.TargetID, ",")
if experimentsDetails.TargetID == "" { if experimentsDetails.TargetID == "" {
return errors.Errorf("no target id found to perform chaos on") return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no target id found"}
} }
// watching for the abort signal and revert the chaos // watching for the abort signal and revert the chaos
@ -54,15 +57,15 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
switch strings.ToLower(experimentsDetails.Sequence) { switch strings.ToLower(experimentsDetails.Sequence) {
case "serial": case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInSerialMode(ctx, experimentsDetails, targetIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in serial mode")
} }
case "parallel": case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil { if err = injectChaosInParallelMode(ctx, experimentsDetails, targetIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err return stacktrace.Propagate(err, "could not run chaos in parallel mode")
} }
default: default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence) return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
} }
//Waiting for the ramp time after chaos injection //Waiting for the ramp time after chaos injection
@ -74,7 +77,10 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
} }
//injectChaosInSerialMode will inject the chaos on the target one after other //injectChaosInSerialMode will inject the chaos on the target one after other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInSerialMode(ctx context.Contxt, experimentsDetails *experimentTypes.ExperimentDetails, targetIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "Inject[name-your-chaos]FaultInSerialMode")
// defer span.End()
select { select {
case <-inject: case <-inject:
@ -112,7 +118,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
// The OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration // The OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 { if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }
@ -137,7 +143,10 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
} }
// injectChaosInParallelMode will inject the chaos on the target all at once // injectChaosInParallelMode will inject the chaos on the target all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error { func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "Inject[name-your-chaos]FaultInParallelMode")
// defer span.End()
select { select {
case <-inject: case <-inject:
@ -178,7 +187,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
// run the probes during chaos // run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil { if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err return err
} }
} }

View File

@ -20,7 +20,6 @@ func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30")) experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10")) experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0")) experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosLib = types.Getenv("LIB", "litmus")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", "")) experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "") experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "") experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")

View File

@ -20,7 +20,6 @@ func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30")) experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10")) experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0")) experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosLib = types.Getenv("LIB", "litmus")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", "")) experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "") experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "") experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")

View File

@ -20,7 +20,6 @@ func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30")) experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10")) experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0")) experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosLib = types.Getenv("LIB", "litmus")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "") experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.AppLabel = types.Getenv("APP_LABEL", "") experimentDetails.AppLabel = types.Getenv("APP_LABEL", "")
experimentDetails.AppKind = types.Getenv("APP_KIND", "") experimentDetails.AppKind = types.Getenv("APP_KIND", "")

View File

@ -20,7 +20,6 @@ func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30")) experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10")) experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0")) experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosLib = types.Getenv("LIB", "litmus")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", "")) experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "") experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "") experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")

View File

@ -20,7 +20,6 @@ func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30")) experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10")) experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0")) experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosLib = types.Getenv("LIB", "litmus")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "") experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.AppLabel = types.Getenv("APP_LABEL", "") experimentDetails.AppLabel = types.Getenv("APP_LABEL", "")
experimentDetails.AppKind = types.Getenv("APP_KIND", "") experimentDetails.AppKind = types.Getenv("APP_KIND", "")

View File

@ -20,7 +20,6 @@ func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30")) experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10")) experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0")) experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosLib = types.Getenv("LIB", "litmus")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", "")) experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "") experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "") experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")

View File

@ -1,11 +1,11 @@
package experiment package experiment
import ( import (
"context"
"os" "os"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/{{ .Name }}/lib" litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/{{ .Name }}/lib"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1" "github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/{{ .Name }}/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients" clients "github.com/litmuschaos/litmus-go/pkg/clients"
aws "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2" aws "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
@ -20,7 +20,7 @@ import (
) )
// Experiment contains steps to inject chaos // Experiment contains steps to inject chaos
func Experiment(clients clients.ClientSets){ func Experiment(ctx context.Context, clients clients.ClientSets){
experimentsDetails := experimentTypes.ExperimentDetails{} experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{} resultDetails := types.ResultDetails{}
@ -38,19 +38,18 @@ func Experiment(clients clients.ClientSets){
types.SetResultAttributes(&resultDetails, chaosDetails) types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
// Initialize the probe details. Bail out upon error, as we haven't entered exp business logic yet // Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := probe.InitializeProbesInChaosResultDetails(&chaosDetails, clients, &resultDetails); err != nil { if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err) log.Errorf("Unable to initialize the probes, err: %v", err)
return return
} }
} }
//Updating the chaos result in the beginning of experiment //Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName) log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil { if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err) log.Errorf("Unable to Create the Chaos Result, err: %v", err)
failStep := "[pre-chaos]: Failed to update the chaos result of pod-delete experiment (SOT), err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
@ -80,8 +79,7 @@ func Experiment(clients clients.ClientSets){
log.Info("[Status]: Verify that the aws ec2 instances are in running state (pre-chaos)") log.Info("[Status]: Verify that the aws ec2 instances are in running state (pre-chaos)")
if err := aws.InstanceStatusCheckByID(experimentsDetails.TargetID, experimentsDetails.Region); err != nil { if err := aws.InstanceStatusCheckByID(experimentsDetails.TargetID, experimentsDetails.Region); err != nil {
log.Errorf("failed to get the ec2 instance status, err: %v", err) log.Errorf("failed to get the ec2 instance status, err: %v", err)
failStep := "[pre-chaos]: Failed to verify the AWS ec2 instance status, err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
log.Info("[Status]: EC2 instance is in running state") log.Info("[Status]: EC2 instance is in running state")
@ -93,13 +91,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the pre-chaos check // run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil { if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
log.Errorf("Probe Failed, err: %v", err) log.Errorf("Probe Failed, err: %v", err)
failStep := "[pre-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful" msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails) types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
msg = "AUT: Running, Probes: Successful" msg = "AUT: Running, Probes: Successful"
@ -113,25 +110,17 @@ func Experiment(clients clients.ClientSets){
// THE BUSINESS LOGIC OF THE ACTUAL CHAOS // THE BUSINESS LOGIC OF THE ACTUAL CHAOS
// IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE // IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE
// @TODO: user INVOKE-CHAOSLIB // @TODO: user INVOKE-CHAOSLIB
// Including the litmus lib chaosDetails.Phase = types.ChaosInjectPhase
switch experimentsDetails.ChaosLib { if err := litmusLIB.PrepareChaos(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
case "litmus": log.Errorf("Chaos injection failed, err: %v", err)
if err := litmusLIB.PrepareChaos(&experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil { result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
log.Errorf("Chaos injection failed, err: %v", err) return
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error() }
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
default:
log.Error("[Invalid]: Please Provide the correct LIB")
failStep := "[chaos]: no match found for specified lib"
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName) log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
// @TODO: user POST-CHAOS-CHECK // @TODO: user POST-CHAOS-CHECK
// ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE // ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE
@ -142,8 +131,7 @@ func Experiment(clients clients.ClientSets){
log.Info("[Status]: Verify that the aws ec2 instances are in running state (post-chaos)") log.Info("[Status]: Verify that the aws ec2 instances are in running state (post-chaos)")
if err := aws.InstanceStatusCheckByID(experimentsDetails.TargetID, experimentsDetails.Region); err != nil { if err := aws.InstanceStatusCheckByID(experimentsDetails.TargetID, experimentsDetails.Region); err != nil {
log.Errorf("failed to get the ec2 instance status, err: %v", err) log.Errorf("failed to get the ec2 instance status, err: %v", err)
failStep := "[post-chaos]: Failed to verify the AWS ec2 instance status, err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
log.Info("[Status]: EC2 instance is in running state (post chaos)") log.Info("[Status]: EC2 instance is in running state (post chaos)")
@ -155,13 +143,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the post-chaos check // run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil { if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
log.Errorf("Probes Failed, err: %v", err) log.Errorf("Probes Failed, err: %v", err)
failStep := "[post-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful" msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails) types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
msg = "AUT: Running, Probes: Successful" msg = "AUT: Running, Probes: Successful"
@ -177,17 +164,13 @@ func Experiment(clients clients.ClientSets){
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName) log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil { if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err) log.Errorf("Unable to Update the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
// generating the event in chaosresult to marked the verdict as pass/fail // generating the event in chaosresult to mark the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict) msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason := types.PassVerdict reason, eventType := types.GetChaosResultVerdictEvent(resultDetails.Verdict)
eventType := "Normal"
if resultDetails.Verdict != "Pass" {
reason = types.FailVerdict
eventType = "Warning"
}
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails) types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult") events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")

View File

@ -1,6 +1,7 @@
package experiment package experiment
import ( import (
"context"
"os" "os"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1" "github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
@ -20,7 +21,7 @@ import (
) )
// Experiment contains steps to inject chaos // Experiment contains steps to inject chaos
func Experiment(clients clients.ClientSets){ func Experiment(ctx context.Context, clients clients.ClientSets){
experimentsDetails := experimentTypes.ExperimentDetails{} experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{} resultDetails := types.ResultDetails{}
@ -39,19 +40,18 @@ func Experiment(clients clients.ClientSets){
types.SetResultAttributes(&resultDetails, chaosDetails) types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
// Initialize the probe details. Bail out upon error, as we haven't entered exp business logic yet // Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := probe.InitializeProbesInChaosResultDetails(&chaosDetails, clients, &resultDetails); err != nil { if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err) log.Errorf("Unable to initialize the probes, err: %v", err)
return return
} }
} }
//Updating the chaos result in the beginning of experiment //Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName) log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil { if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err) log.Errorf("Unable to Create the Chaos Result, err: %v", err)
failStep := "[pre-chaos]: Failed to update the chaos result of pod-delete experiment (SOT), err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
@ -77,8 +77,7 @@ func Experiment(clients clients.ClientSets){
// Setting up Azure Subscription ID // Setting up Azure Subscription ID
if experimentsDetails.SubscriptionID, err = azureCommon.GetSubscriptionID(); err != nil { if experimentsDetails.SubscriptionID, err = azureCommon.GetSubscriptionID(); err != nil {
log.Errorf("fail to get the subscription id, err: %v", err) log.Errorf("fail to get the subscription id, err: %v", err)
failStep := "[pre-chaos]: Failed to get the subscription ID for authentication, err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
@ -89,8 +88,7 @@ func Experiment(clients clients.ClientSets){
//Verify the azure target instance is running (pre-chaos) //Verify the azure target instance is running (pre-chaos)
if err := azureStatus.InstanceStatusCheckByName(experimentsDetails.TargetID, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup); err != nil { if err := azureStatus.InstanceStatusCheckByName(experimentsDetails.TargetID, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup); err != nil {
log.Errorf("failed to get the azure instance status, err: %v", err) log.Errorf("failed to get the azure instance status, err: %v", err)
failStep := "[pre-chaos]: Failed to verify the azure instance status, err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
log.Info("[Status]: Azure instance(s) is in running state (pre-chaos)") log.Info("[Status]: Azure instance(s) is in running state (pre-chaos)")
@ -102,13 +100,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the pre-chaos check // run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil { if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
log.Errorf("Probe Failed, err: %v", err) log.Errorf("Probe Failed, err: %v", err)
failStep := "[pre-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful" msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails) types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
msg = "AUT: Running, Probes: Successful" msg = "AUT: Running, Probes: Successful"
@ -122,25 +119,17 @@ func Experiment(clients clients.ClientSets){
// THE BUSINESS LOGIC OF THE ACTUAL CHAOS // THE BUSINESS LOGIC OF THE ACTUAL CHAOS
// IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE // IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE
// @TODO: user INVOKE-CHAOSLIB // @TODO: user INVOKE-CHAOSLIB
// Including the litmus lib chaosDetails.Phase = types.ChaosInjectPhase
switch experimentsDetails.ChaosLib { if err := litmusLIB.PrepareChaos(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
case "litmus": log.Errorf("Chaos injection failed, err: %v", err)
if err := litmusLIB.PrepareChaos(&experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil { result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
log.Errorf("Chaos injection failed, err: %v", err) return
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error() }
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
default:
log.Error("[Invalid]: Please Provide the correct LIB")
failStep := "[chaos]: no match found for specified lib"
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName) log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
// @TODO: user POST-CHAOS-CHECK // @TODO: user POST-CHAOS-CHECK
// ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE // ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE
@ -149,8 +138,7 @@ func Experiment(clients clients.ClientSets){
//Verify the azure instance is running (post chaos) //Verify the azure instance is running (post chaos)
if err := azureStatus.InstanceStatusCheckByName(experimentsDetails.TargetID, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup); err != nil { if err := azureStatus.InstanceStatusCheckByName(experimentsDetails.TargetID, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup); err != nil {
log.Errorf("failed to get the azure instance status, err: %v", err) log.Errorf("failed to get the azure instance status, err: %v", err)
failStep := "[pre-chaos]: Failed to update the azure instance status, err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
log.Info("[Status]: Azure instance is in running state (post chaos)") log.Info("[Status]: Azure instance is in running state (post chaos)")
@ -161,13 +149,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the post-chaos check // run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil { if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
log.Errorf("Probes Failed, err: %v", err) log.Errorf("Probes Failed, err: %v", err)
failStep := "[post-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful" msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails) types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
msg = "AUT: Running, Probes: Successful" msg = "AUT: Running, Probes: Successful"
@ -183,17 +170,13 @@ func Experiment(clients clients.ClientSets){
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName) log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil { if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err) log.Errorf("Unable to Update the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
// generating the event in chaosresult to marked the verdict as pass/fail // generating the event in chaosresult to mark the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict) msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason := types.PassVerdict reason, eventType := types.GetChaosResultVerdictEvent(resultDetails.Verdict)
eventType := "Normal"
if resultDetails.Verdict != "Pass" {
reason = types.FailVerdict
eventType = "Warning"
}
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails) types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult") events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")

View File

@ -43,9 +43,6 @@ spec:
- name: CHAOS_INTERVAL - name: CHAOS_INTERVAL
value: '' value: ''
- name: LIB
value: ''
- name: RAMP_TIME - name: RAMP_TIME
value: '' value: ''

View File

@ -1,6 +1,7 @@
package experiment package experiment
import ( import (
"context"
"os" "os"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1" "github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
@ -37,19 +38,18 @@ func Experiment(clients clients.ClientSets){
types.SetResultAttributes(&resultDetails, chaosDetails) types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
// Initialize the probe details. Bail out upon error, as we haven't entered exp business logic yet // Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := probe.InitializeProbesInChaosResultDetails(&chaosDetails, clients, &resultDetails); err != nil { if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err) log.Errorf("Unable to initialize the probes, err: %v", err)
return return
} }
} }
//Updating the chaos result in the beginning of experiment //Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName) log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil { if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err) log.Errorf("Unable to Create the Chaos Result, err: %v", err)
failStep := "[pre-chaos]: Failed to update the chaos result of pod-delete experiment (SOT), err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
@ -75,16 +75,14 @@ func Experiment(clients clients.ClientSets){
computeService, err := gcp.GetGCPComputeService() computeService, err := gcp.GetGCPComputeService()
if err != nil { if err != nil {
log.Errorf("failed to obtain a gcp compute service, err: %v", err) log.Errorf("failed to obtain a gcp compute service, err: %v", err)
failStep := "[pre-chaos]: Failed to obtain a gcp compute service, err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
// Verify that the GCP VM instance(s) is in RUNNING state (pre-chaos) // Verify that the GCP VM instance(s) is in RUNNING state (pre-chaos)
if err := gcp.InstanceStatusCheckByName(computeService, experimentsDetails.ManagedInstanceGroup, experimentsDetails.Delay, experimentsDetails.Timeout, "pre-chaos", experimentsDetails.TargetID, experimentsDetails.GCPProjectID, experimentsDetails.InstanceZone); err != nil { if err := gcp.InstanceStatusCheckByName(computeService, experimentsDetails.ManagedInstanceGroup, experimentsDetails.Delay, experimentsDetails.Timeout, "pre-chaos", experimentsDetails.TargetID, experimentsDetails.GCPProjectID, experimentsDetails.InstanceZone); err != nil {
log.Errorf("failed to get the vm instance status, err: %v", err) log.Errorf("failed to get the vm instance status, err: %v", err)
failStep := "[pre-chaos]: Failed to verify the GCP VM instance status, err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
@ -101,13 +99,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the pre-chaos check // run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil { if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
log.Errorf("Probe Failed, err: %v", err) log.Errorf("Probe Failed, err: %v", err)
failStep := "[pre-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful" msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails) types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
msg = "AUT: Running, Probes: Successful" msg = "AUT: Running, Probes: Successful"
@ -121,25 +118,18 @@ func Experiment(clients clients.ClientSets){
// THE BUSINESS LOGIC OF THE ACTUAL CHAOS // THE BUSINESS LOGIC OF THE ACTUAL CHAOS
// IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE // IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE
// @TODO: user INVOKE-CHAOSLIB // @TODO: user INVOKE-CHAOSLIB
// Including the litmus lib chaosDetails.Phase = types.ChaosInjectPhase
switch experimentsDetails.ChaosLib { if err := litmusLIB.PrepareChaos(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
case "litmus": log.Errorf("Chaos injection failed, err: %v", err)
if err := litmusLIB.PrepareChaos(&experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil { failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error()
log.Errorf("Chaos injection failed, err: %v", err) result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error() return
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) }
return
}
default:
log.Error("[Invalid]: Please Provide the correct LIB")
failStep := "[chaos]: no match found for specified lib"
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName) log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
// @TODO: user POST-CHAOS-CHECK // @TODO: user POST-CHAOS-CHECK
// ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE // ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE
@ -148,8 +138,7 @@ func Experiment(clients clients.ClientSets){
//Verify the GCP VM instance is in RUNNING status (post-chaos) //Verify the GCP VM instance is in RUNNING status (post-chaos)
if err := gcp.InstanceStatusCheckByName(computeService, experimentsDetails.ManagedInstanceGroup, experimentsDetails.Delay, experimentsDetails.Timeout, "post-chaos", experimentsDetails.TargetID, experimentsDetails.GCPProjectID, experimentsDetails.InstanceZone); err != nil { if err := gcp.InstanceStatusCheckByName(computeService, experimentsDetails.ManagedInstanceGroup, experimentsDetails.Delay, experimentsDetails.Timeout, "post-chaos", experimentsDetails.TargetID, experimentsDetails.GCPProjectID, experimentsDetails.InstanceZone); err != nil {
log.Errorf("failed to get the vm instance status, err: %v", err) log.Errorf("failed to get the vm instance status, err: %v", err)
failStep := "[post-chaos]: Failed to verify the GCP VM instance status, err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
@ -161,13 +150,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the post-chaos check // run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil { if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
log.Errorf("Probes Failed, err: %v", err) log.Errorf("Probes Failed, err: %v", err)
failStep := "[post-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful" msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails) types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
msg = "AUT: Running, Probes: Successful" msg = "AUT: Running, Probes: Successful"
@ -183,17 +171,13 @@ func Experiment(clients clients.ClientSets){
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName) log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil { if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err) log.Errorf("Unable to Update the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
// generating the event in chaosresult to marked the verdict as pass/fail // generating the event in chaosresult to mark the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict) msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason := types.PassVerdict reason, eventType := types.GetChaosResultVerdictEvent(resultDetails.Verdict)
eventType := "Normal"
if resultDetails.Verdict != "Pass" {
reason = types.FailVerdict
eventType = "Warning"
}
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails) types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult") events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")

View File

@ -1,6 +1,7 @@
package experiment package experiment
import ( import (
"context"
"os" "os"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1" "github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
@ -19,7 +20,7 @@ import (
) )
// Experiment contains steps to inject chaos // Experiment contains steps to inject chaos
func Experiment(clients clients.ClientSets){ func Experiment(ctx context.Context, clients clients.ClientSets){
experimentsDetails := experimentTypes.ExperimentDetails{} experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{} resultDetails := types.ResultDetails{}
@ -37,19 +38,18 @@ func Experiment(clients clients.ClientSets){
types.SetResultAttributes(&resultDetails, chaosDetails) types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
// Initialize the probe details. Bail out upon error, as we haven't entered exp business logic yet // Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := probe.InitializeProbesInChaosResultDetails(&chaosDetails, clients, &resultDetails); err != nil { if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err) log.Errorf("Unable to initialize the probes, err: %v", err)
return return
} }
} }
//Updating the chaos result in the beginning of experiment //Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName) log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil { if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err) log.Errorf("Unable to Create the Chaos Result, err: %v", err)
failStep := "[pre-chaos]: Failed to update the chaos result of pod-delete experiment (SOT), err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
@ -78,25 +78,23 @@ func Experiment(clients clients.ClientSets){
//PRE-CHAOS APPLICATION STATUS CHECK //PRE-CHAOS APPLICATION STATUS CHECK
if chaosDetails.DefaultHealthCheck { if chaosDetails.DefaultHealthCheck {
log.Info("[Status]: Verify that the AUT (Application Under Test) is running (pre-chaos)") log.Info("[Status]: Verify that the AUT (Application Under Test) is running (pre-chaos)")
if err := status.AUTStatusCheck(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.TargetContainer, experimentsDetails.Timeout, experimentsDetails.Delay, clients, &chaosDetails); err != nil { if err := status.AUTStatusCheck(clients, &chaosDetails); err != nil {
log.Errorf("Application status check failed, err: %v", err) log.Errorf("Application status check failed, err: %v", err)
failStep := "[pre-chaos]: Failed to verify that the AUT (Application Under Test) is in running state, err: " + err.Error() types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, "AUT: Not Running", "Warning", &chaosDetails)
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, "AUT: Not Running", "Warning", &chaosDetails) events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) return
return }
}
} }
{{ if eq .AuxiliaryAppCheck true }} {{ if eq .AuxiliaryAppCheck true }}
//PRE-CHAOS AUXILIARY APPLICATION STATUS CHECK //PRE-CHAOS AUXILIARY APPLICATION STATUS CHECK
if experimentsDetails.AuxiliaryAppInfo != "" { if experimentsDetails.AuxiliaryAppInfo != "" {
log.Info("[Status]: Verify that the Auxiliary Applications are running (pre-chaos)") log.Info("[Status]: Verify that the Auxiliary Applications are running (pre-chaos)")
if err := status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients);err != nil { if err := status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
log.Errorf("Auxiliary Application status check failed, err: %v", err) log.Errorf("Auxiliary Application status check failed, err: %v", err)
failStep := "[pre-chaos]: Failed to verify that the Auxiliary Applications are in running state, err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) return
return }
}
} }
{{- end }} {{- end }}
@ -107,13 +105,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the pre-chaos check // run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil { if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
log.Errorf("Probe Failed, err: %v", err) log.Errorf("Probe Failed, err: %v", err)
failStep := "[pre-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful" msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails) types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
msg = "AUT: Running, Probes: Successful" msg = "AUT: Running, Probes: Successful"
@ -128,24 +125,16 @@ func Experiment(clients clients.ClientSets){
// IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE // IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE
// @TODO: user INVOKE-CHAOSLIB // @TODO: user INVOKE-CHAOSLIB
// Including the litmus lib chaosDetails.Phase = types.ChaosInjectPhase
switch experimentsDetails.ChaosLib { if err := litmusLIB.PrepareChaos(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
case "litmus": log.Errorf("Chaos injection failed, err: %v", err)
if err := litmusLIB.PrepareChaos(&experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil { result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
log.Errorf("Chaos injection failed, err: %v", err) return
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error() }
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
default:
log.Error("[Invalid]: Please Provide the correct LIB")
failStep := "[chaos]: no match found for specified lib"
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName) log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
// @TODO: user POST-CHAOS-CHECK // @TODO: user POST-CHAOS-CHECK
// ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE // ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE
@ -154,12 +143,11 @@ func Experiment(clients clients.ClientSets){
//POST-CHAOS APPLICATION STATUS CHECK //POST-CHAOS APPLICATION STATUS CHECK
if chaosDetails.DefaultHealthCheck { if chaosDetails.DefaultHealthCheck {
log.Info("[Status]: Verify that the AUT (Application Under Test) is running (post-chaos)") log.Info("[Status]: Verify that the AUT (Application Under Test) is running (post-chaos)")
if err := status.AUTStatusCheck(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.TargetContainer, experimentsDetails.Timeout, experimentsDetails.Delay, clients, &chaosDetails); err != nil { if err := status.AUTStatusCheck(clients, &chaosDetails); err != nil {
log.Errorf("Application status check failed, err: %v", err) log.Errorf("Application status check failed, err: %v", err)
failStep := "[post-chaos]: Failed to verify that the AUT (Application Under Test) is running, err: " + err.Error()
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, "AUT: Not Running", "Warning", &chaosDetails) types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, "AUT: Not Running", "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
} }
@ -167,10 +155,9 @@ func Experiment(clients clients.ClientSets){
//POST-CHAOS AUXILIARY APPLICATION STATUS CHECK //POST-CHAOS AUXILIARY APPLICATION STATUS CHECK
if experimentsDetails.AuxiliaryAppInfo != "" { if experimentsDetails.AuxiliaryAppInfo != "" {
log.Info("[Status]: Verify that the Auxiliary Applications are running (post-chaos)") log.Info("[Status]: Verify that the Auxiliary Applications are running (post-chaos)")
if err := status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients);err != nil { if err := status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
log.Errorf("Auxiliary Application status check failed, err: %v", err) log.Errorf("Auxiliary Application status check failed, err: %v", err)
failStep := "[post-chaos]: Failed to verify that the Auxiliary Applications are running, err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
} }
@ -182,13 +169,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the post-chaos check // run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil { if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
log.Errorf("Probes Failed, err: %v", err) log.Errorf("Probes Failed, err: %v", err)
failStep := "[post-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful" msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails) types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
msg = "AUT: Running, Probes: Successful" msg = "AUT: Running, Probes: Successful"
@ -204,17 +190,13 @@ func Experiment(clients clients.ClientSets){
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName) log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil { if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err) log.Errorf("Unable to Update the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
// generating the event in chaosresult to marked the verdict as pass/fail // generating the event in chaosresult to mark the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict) msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason := types.PassVerdict reason, eventType := types.GetChaosResultVerdictEvent(resultDetails.Verdict)
eventType := "Normal"
if resultDetails.Verdict != "Pass" {
reason = types.FailVerdict
eventType = "Warning"
}
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails) types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult") events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")

View File

@ -45,11 +45,6 @@ spec:
- name: RAMP_TIME - name: RAMP_TIME
value: '' value: ''
## env var that describes the library used to execute the chaos
## default: litmus. Supported values: litmus, powerfulseal, chaoskube
- name: LIB
value: ''
# provide the chaos namespace # provide the chaos namespace
- name: CHAOS_NAMESPACE - name: CHAOS_NAMESPACE
value: '' value: ''

View File

@ -1,8 +1,10 @@
package experiment package experiment
import ( import (
"context"
"os" "os"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1" "github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/{{ .Name }}/lib" litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/{{ .Name }}/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients" clients "github.com/litmuschaos/litmus-go/pkg/clients"
@ -19,7 +21,7 @@ import (
) )
// Experiment contains steps to inject chaos // Experiment contains steps to inject chaos
func Experiment(clients clients.ClientSets){ func Experiment(ctx context.Context, clients clients.ClientSets){
experimentsDetails := experimentTypes.ExperimentDetails{} experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{} resultDetails := types.ResultDetails{}
@ -48,8 +50,7 @@ func Experiment(clients clients.ClientSets){
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName) log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil { if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err) log.Errorf("Unable to Create the Chaos Result, err: %v", err)
failStep := "[pre-chaos]: Failed to update the chaos result of pod-delete experiment (SOT), err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
@ -74,8 +75,7 @@ func Experiment(clients clients.ClientSets){
// GET SESSION ID TO LOGIN TO VCENTER // GET SESSION ID TO LOGIN TO VCENTER
cookie, err := vmware.GetVcenterSessionID(experimentsDetails.VcenterServer, experimentsDetails.VcenterUser, experimentsDetails.VcenterPass) cookie, err := vmware.GetVcenterSessionID(experimentsDetails.VcenterServer, experimentsDetails.VcenterUser, experimentsDetails.VcenterPass)
if err != nil { if err != nil {
failStep := "[pre-chaos]: Failed to obtain the Vcenter session ID, err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
log.Errorf("Vcenter Login failed, err: %v", err) log.Errorf("Vcenter Login failed, err: %v", err)
return return
} }
@ -87,8 +87,7 @@ func Experiment(clients clients.ClientSets){
// PRE-CHAOS VM STATUS CHECK // PRE-CHAOS VM STATUS CHECK
if err := vmware.VMStatusCheck(experimentsDetails.VcenterServer, experimentsDetails.TargetID, cookie); err != nil { if err := vmware.VMStatusCheck(experimentsDetails.VcenterServer, experimentsDetails.TargetID, cookie); err != nil {
log.Errorf("Failed to get the VM status, err: %v", err) log.Errorf("Failed to get the VM status, err: %v", err)
failStep := "[pre-chaos]: Failed to verify the VM status, err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
log.Info("[Verification]: VMs are in running state (pre-chaos)") log.Info("[Verification]: VMs are in running state (pre-chaos)")
@ -100,13 +99,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the pre-chaos check // run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil { if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
log.Errorf("Probe Failed, err: %v", err) log.Errorf("Probe Failed, err: %v", err)
failStep := "[pre-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful" msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails) types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
msg = "AUT: Running, Probes: Successful" msg = "AUT: Running, Probes: Successful"
@ -120,25 +118,18 @@ func Experiment(clients clients.ClientSets){
// THE BUSINESS LOGIC OF THE ACTUAL CHAOS // THE BUSINESS LOGIC OF THE ACTUAL CHAOS
// IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE // IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE
// @TODO: user INVOKE-CHAOSLIB // @TODO: user INVOKE-CHAOSLIB
// Including the litmus lib chaosDetails.Phase = types.ChaosInjectPhase
switch experimentsDetails.ChaosLib { if err := litmusLIB.PrepareChaos(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
case "litmus": log.Errorf("Chaos injection failed, err: %v", err)
if err := litmusLIB.PrepareChaos(&experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil { failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error()
log.Errorf("Chaos injection failed, err: %v", err) result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error() return
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) }
return
}
default:
log.Error("[Invalid]: Please Provide the correct LIB")
failStep := "[chaos]: no match found for specified lib"
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName) log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
// @TODO: user POST-CHAOS-CHECK // @TODO: user POST-CHAOS-CHECK
// ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE // ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE
@ -148,8 +139,7 @@ func Experiment(clients clients.ClientSets){
log.Info("[Status]: Verify that the IUT (Instance Under Test) is running (post-chaos)") log.Info("[Status]: Verify that the IUT (Instance Under Test) is running (post-chaos)")
if err := vmware.VMStatusCheck(experimentsDetails.VcenterServer, experimentsDetails.TargetID, cookie); err != nil { if err := vmware.VMStatusCheck(experimentsDetails.VcenterServer, experimentsDetails.TargetID, cookie); err != nil {
log.Errorf("Failed to get the VM status, err: %v", err) log.Errorf("Failed to get the VM status, err: %v", err)
failStep := "[post-chaos]: Failed to get the VM status, err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
log.Info("[Verification]: VMs are in running state (post-chaos)") log.Info("[Verification]: VMs are in running state (post-chaos)")
@ -160,13 +150,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the post-chaos check // run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil { if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
log.Errorf("Probes Failed, err: %v", err) log.Errorf("Probes Failed, err: %v", err)
failStep := "[post-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful" msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails) types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
msg = "AUT: Running, Probes: Successful" msg = "AUT: Running, Probes: Successful"
@ -182,6 +171,7 @@ func Experiment(clients clients.ClientSets){
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName) log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil { if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err) log.Errorf("Unable to Update the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }

View File

@ -5,7 +5,7 @@ import (
) )
// ADD THE ATTRIBUTES OF YOUR CHOICE HERE // ADD THE ATTRIBUTES OF YOUR CHOICE HERE
// FEW MENDATORY ATTRIBUTES ARE ADDED BY DEFAULT // FEW MANDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// ExperimentDetails is for collecting all the experiment-related details // ExperimentDetails is for collecting all the experiment-related details
type ExperimentDetails struct { type ExperimentDetails struct {
@ -14,7 +14,6 @@ type ExperimentDetails struct {
ChaosDuration int ChaosDuration int
ChaosInterval int ChaosInterval int
RampTime int RampTime int
ChaosLib string
ChaosUID clientTypes.UID ChaosUID clientTypes.UID
InstanceID string InstanceID string
ChaosNamespace string ChaosNamespace string

View File

@ -5,7 +5,7 @@ import (
) )
// ADD THE ATTRIBUTES OF YOUR CHOICE HERE // ADD THE ATTRIBUTES OF YOUR CHOICE HERE
// FEW MENDATORY ATTRIBUTES ARE ADDED BY DEFAULT // FEW MANDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// ExperimentDetails is for collecting all the experiment-related details // ExperimentDetails is for collecting all the experiment-related details
type ExperimentDetails struct { type ExperimentDetails struct {
@ -14,7 +14,6 @@ type ExperimentDetails struct {
ChaosDuration int ChaosDuration int
ChaosInterval int ChaosInterval int
RampTime int RampTime int
ChaosLib string
ChaosUID clientTypes.UID ChaosUID clientTypes.UID
InstanceID string InstanceID string
ChaosNamespace string ChaosNamespace string

View File

@ -5,7 +5,7 @@ import (
) )
// ADD THE ATTRIBUTES OF YOUR CHOICE HERE // ADD THE ATTRIBUTES OF YOUR CHOICE HERE
// FEW MENDATORY ATTRIBUTES ARE ADDED BY DEFAULT // FEW MANDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// ExperimentDetails is for collecting all the experiment-related details // ExperimentDetails is for collecting all the experiment-related details
type ExperimentDetails struct { type ExperimentDetails struct {
@ -14,7 +14,6 @@ type ExperimentDetails struct {
ChaosDuration int ChaosDuration int
ChaosInterval int ChaosInterval int
RampTime int RampTime int
ChaosLib string
AppNS string AppNS string
AppLabel string AppLabel string
AppKind string AppKind string
@ -31,4 +30,5 @@ type ExperimentDetails struct {
PodsAffectedPerc int PodsAffectedPerc int
TargetPods string TargetPods string
LIBImagePullPolicy string LIBImagePullPolicy string
IsTargetContainerProvided bool
} }

View File

@ -5,7 +5,7 @@ import (
) )
// ADD THE ATTRIBUTES OF YOUR CHOICE HERE // ADD THE ATTRIBUTES OF YOUR CHOICE HERE
// FEW MENDATORY ATTRIBUTES ARE ADDED BY DEFAULT // FEW MANDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// ExperimentDetails is for collecting all the experiment-related details // ExperimentDetails is for collecting all the experiment-related details
type ExperimentDetails struct { type ExperimentDetails struct {
@ -14,7 +14,6 @@ type ExperimentDetails struct {
ChaosDuration int ChaosDuration int
ChaosInterval int ChaosInterval int
RampTime int RampTime int
ChaosLib string
ChaosUID clientTypes.UID ChaosUID clientTypes.UID
InstanceID string InstanceID string
ChaosNamespace string ChaosNamespace string

View File

@ -5,7 +5,7 @@ import (
) )
// ADD THE ATTRIBUTES OF YOUR CHOICE HERE // ADD THE ATTRIBUTES OF YOUR CHOICE HERE
// FEW MENDATORY ATTRIBUTES ARE ADDED BY DEFAULT // FEW MANDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// ExperimentDetails is for collecting all the experiment-related details // ExperimentDetails is for collecting all the experiment-related details
type ExperimentDetails struct { type ExperimentDetails struct {
@ -14,7 +14,6 @@ type ExperimentDetails struct {
ChaosDuration int ChaosDuration int
ChaosInterval int ChaosInterval int
RampTime int RampTime int
ChaosLib string
AppNS string AppNS string
AppLabel string AppLabel string
AppKind string AppKind string
@ -32,4 +31,5 @@ type ExperimentDetails struct {
LIBImage string LIBImage string
SetHelperData string SetHelperData string
ChaosServiceAccount string ChaosServiceAccount string
IsTargetContainerProvided bool
} }

View File

@ -5,7 +5,7 @@ import (
) )
// ADD THE ATTRIBUTES OF YOUR CHOICE HERE // ADD THE ATTRIBUTES OF YOUR CHOICE HERE
// FEW MENDATORY ATTRIBUTES ARE ADDED BY DEFAULT // FEW MANDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// ExperimentDetails is for collecting all the experiment-related details // ExperimentDetails is for collecting all the experiment-related details
type ExperimentDetails struct { type ExperimentDetails struct {
@ -14,7 +14,6 @@ type ExperimentDetails struct {
ChaosDuration int ChaosDuration int
ChaosInterval int ChaosInterval int
RampTime int RampTime int
ChaosLib string
ChaosUID clientTypes.UID ChaosUID clientTypes.UID
InstanceID string InstanceID string
ChaosNamespace string ChaosNamespace string

View File

@ -1,13 +1,14 @@
package experiment package experiment
import ( import (
"context"
"os" "os"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1" "github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/aws-ssm-chaos/lib/ssm" litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/aws-ssm-chaos/lib/ssm"
experimentEnv "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/environment" experimentEnv "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/environment"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types" experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients" "github.com/litmuschaos/litmus-go/pkg/clients"
ec2 "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2" ec2 "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm" "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/events" "github.com/litmuschaos/litmus-go/pkg/events"
@ -20,7 +21,7 @@ import (
) )
// AWSSSMChaosByID inject the ssm chaos on ec2 instance // AWSSSMChaosByID inject the ssm chaos on ec2 instance
func AWSSSMChaosByID(clients clients.ClientSets) { func AWSSSMChaosByID(ctx context.Context, clients clients.ClientSets) {
experimentsDetails := experimentTypes.ExperimentDetails{} experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{} resultDetails := types.ResultDetails{}
@ -38,9 +39,9 @@ func AWSSSMChaosByID(clients clients.ClientSets) {
types.SetResultAttributes(&resultDetails, chaosDetails) types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
// Initialize the probe details. Bail out upon error, as we haven't entered exp business logic yet // Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := probe.InitializeProbesInChaosResultDetails(&chaosDetails, clients, &resultDetails); err != nil { if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err) log.Errorf("Unable to initialize the probes: %v", err)
return return
} }
} }
@ -48,9 +49,8 @@ func AWSSSMChaosByID(clients clients.ClientSets) {
//Updating the chaos result in the beginning of experiment //Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName) log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT"); err != nil { if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT"); err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err) log.Errorf("Unable to create the chaosresult: %v", err)
failStep := "[pre-chaos]: Failed to update the chaos result of ec2 terminate experiment (SOT), err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
@ -60,8 +60,9 @@ func AWSSSMChaosByID(clients clients.ClientSets) {
// generating the event in chaosresult to marked the verdict as awaited // generating the event in chaosresult to marked the verdict as awaited
msg := "experiment: " + experimentsDetails.ExperimentName + ", Result: Awaited" msg := "experiment: " + experimentsDetails.ExperimentName + ", Result: Awaited"
types.SetResultEventAttributes(&eventsDetails, types.AwaitedVerdict, msg, "Normal", &resultDetails) types.SetResultEventAttributes(&eventsDetails, types.AwaitedVerdict, msg, "Normal", &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult") if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosresult", types.AwaitedVerdict)
}
// Calling AbortWatcher go routine, it will continuously watch for the abort signal and generate the required events and result // Calling AbortWatcher go routine, it will continuously watch for the abort signal and generate the required events and result
go common.AbortWatcherWithoutExit(experimentsDetails.ExperimentName, clients, &resultDetails, &chaosDetails, &eventsDetails) go common.AbortWatcherWithoutExit(experimentsDetails.ExperimentName, clients, &resultDetails, &chaosDetails, &eventsDetails)
@ -80,73 +81,67 @@ func AWSSSMChaosByID(clients clients.ClientSets) {
// run the probes in the pre-chaos check // run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails); err != nil { if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails); err != nil {
log.Errorf("Probe Failed, err: %v", err) log.Errorf("Probe Failed: %v", err)
failStep := "[pre-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful" msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails) types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine"); eventErr != nil {
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) log.Errorf("Failed to create %v event inside chaosengine", types.PreChaosCheck)
}
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
msg = "AUT: Running, Probes: Successful" msg = "AUT: Running, Probes: Successful"
} }
// generating the events for the pre-chaos check // generating the events for the pre-chaos check
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Normal", &chaosDetails) types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosengine", types.PreChaosCheck)
}
} }
//Verify that the instance should have permission to perform ssm api calls //Verify that the instance should have permission to perform ssm api calls
if err := ssm.CheckInstanceInformation(&experimentsDetails); err != nil { if err := ssm.CheckInstanceInformation(&experimentsDetails); err != nil {
log.Errorf("failed perform ssm api calls, err: %v", err) log.Errorf("Failed perform ssm api calls: %v", err)
failStep := "[pre-chaos]: Failed to verify to make SSM api calls, err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
if chaosDetails.DefaultHealthCheck { if chaosDetails.DefaultHealthCheck {
//Verify the aws ec2 instance is running (pre chaos) //Verify the aws ec2 instance is running (pre chaos)
if err := ec2.InstanceStatusCheckByID(experimentsDetails.EC2InstanceID, experimentsDetails.Region); err != nil { if err := ec2.InstanceStatusCheckByID(experimentsDetails.EC2InstanceID, experimentsDetails.Region); err != nil {
log.Errorf("failed to get the ec2 instance status, err: %v", err) log.Errorf("Failed to get the ec2 instance status: %v", err)
failStep := "[pre-chaos]: Failed to verify the AWS ec2 instance status, err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
log.Info("[Status]: EC2 instance is in running state") log.Info("[Status]: EC2 instance is in running state")
} }
// Including the litmus lib for aws-ssm-chaos-by-id chaosDetails.Phase = types.ChaosInjectPhase
switch experimentsDetails.ChaosLib {
case "litmus": if err := litmusLIB.PrepareAWSSSMChaosByID(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
if err := litmusLIB.PrepareAWSSSMChaosByID(&experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil { log.Errorf("Chaos injection failed: %v", err)
log.Errorf("Chaos injection failed, err: %v", err) result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error() //Delete the ssm document on the given aws service monitoring docs
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) if experimentsDetails.IsDocsUploaded {
//Delete the ssm document on the given aws service monitoring docs log.Info("[Recovery]: Delete the uploaded aws ssm docs")
if experimentsDetails.IsDocsUploaded { if err := ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region); err != nil {
log.Info("[Recovery]: Delete the uploaded aws ssm docs") log.Errorf("Failed to delete ssm doc: %v", err)
if err := ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region); err != nil {
log.Errorf("fail to delete ssm doc, err: %v", err)
}
} }
return
} }
default:
log.Error("[Invalid]: Please Provide the correct LIB")
failStep := "[chaos]: no match was found for the specified lib"
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName) log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
if chaosDetails.DefaultHealthCheck { if chaosDetails.DefaultHealthCheck {
//Verify the aws ec2 instance is running (post chaos) //Verify the aws ec2 instance is running (post chaos)
if err := ec2.InstanceStatusCheckByID(experimentsDetails.EC2InstanceID, experimentsDetails.Region); err != nil { if err := ec2.InstanceStatusCheckByID(experimentsDetails.EC2InstanceID, experimentsDetails.Region); err != nil {
log.Errorf("failed to get the ec2 instance status, err: %v", err) log.Errorf("Failed to get the ec2 instance status: %v", err)
failStep := "[post-chaos]: Failed to verify the AWS ec2 instance status, err: " + err.Error() result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return return
} }
log.Info("[Status]: EC2 instance is in running state (post chaos)") log.Info("[Status]: EC2 instance is in running state (post chaos)")
@ -158,13 +153,14 @@ func AWSSSMChaosByID(clients clients.ClientSets) {
// run the probes in the post-chaos check // run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 { if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails); err != nil { if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails); err != nil {
log.Errorf("Probes Failed, err: %v", err) log.Errorf("Probes Failed: %v", err)
failStep := "[post-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful" msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails) types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine"); eventErr != nil {
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails) log.Errorf("Failed to create %v event inside chaosengine", types.PostChaosCheck)
}
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return return
} }
msg = "AUT: Running, Probes: Successful" msg = "AUT: Running, Probes: Successful"
@ -172,31 +168,30 @@ func AWSSSMChaosByID(clients clients.ClientSets) {
// generating post chaos event // generating post chaos event
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Normal", &chaosDetails) types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosengine", types.PostChaosCheck)
}
} }
//Updating the chaosResult in the end of experiment //Updating the chaosResult in the end of experiment
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName) log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT"); err != nil { if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT"); err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err) log.Errorf("Unable to update the chaosresult: %v", err)
return return
} }
// generating the event in chaosresult to marked the verdict as pass/fail // generating the event in chaosresult to marked the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict) msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason := types.PassVerdict reason, eventType := types.GetChaosResultVerdictEvent(resultDetails.Verdict)
eventType := "Normal"
if resultDetails.Verdict != "Pass" {
reason = types.FailVerdict
eventType = "Warning"
}
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails) types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult") if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosresult", reason)
}
if experimentsDetails.EngineName != "" { if experimentsDetails.EngineName != "" {
msg := experimentsDetails.ExperimentName + " experiment has been " + string(resultDetails.Verdict) + "ed" msg := experimentsDetails.ExperimentName + " experiment has been " + string(resultDetails.Verdict) + "ed"
types.SetEngineEventAttributes(&eventsDetails, types.Summary, msg, "Normal", &chaosDetails) types.SetEngineEventAttributes(&eventsDetails, types.Summary, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine") if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosengine", types.Summary)
}
} }
} }

Some files were not shown because too many files have changed in this diff Show More