Compare commits

...

112 Commits

Author SHA1 Message Date
Neelanjan Manna e7b4e7dbe4
chore: adds retries with timeout for litmus and k8s client operations (#766)
* chore: adds retries for k8s api operations

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

* chore: adds retries for litmus api operations

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

---------

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2025-08-14 15:41:34 +05:30
Neelanjan Manna 62a4986c78
chore: adds common functions for helper pod lifecycle management (#764)
Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2025-08-14 12:18:29 +05:30
Neelanjan Manna d626cf3ec4
Merge pull request #761 from litmuschaos/CHAOS-9404
feat: adds port filtering for ip/hostnames for network faults, adds pod-network-rate-limit fault
2025-08-13 16:40:51 +05:30
neelanjan00 59125424c3
feat: adds ip+port filtering, adds pod-network-rate-limit fault
Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2025-08-13 16:13:24 +05:30
Neelanjan Manna 2e7ff836fc
feat: Adds multi container support for pod stress faults (#757)
* chore: Fix typo in log statement

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

* chore: adds multi-container stress chaos system with improved lifecycle management and better error handling

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

---------

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-08-13 16:04:20 +05:30
Prexy e61d5b33be
written test for `workload.go` in `pkg/workloads` (#767)
* written test for workload.go in pkg/workloads

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* checking go formatting

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
2025-08-12 17:30:22 +05:30
Prexy 14fe30c956
test: add unit tests for exec.go file in pkg/utils folder (#755)
* test: add unit tests for exec.go file in pkg/utils folder

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* fixing gofmt

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* creating table driven test and also updates TestCheckPodStatus

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-07-24 15:33:25 +05:30
Prexy 4ae08899e0
test: add unit tests for retry.go in pkg/utils folder (#754)
* test: add unit tests for retry.go in pkg/utils folder

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* fixing gofmt

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
2025-07-24 11:55:42 +05:30
Prexy 2c38220cca
test: add unit tests for RandStringBytesMask and GetRunID in stringutils (#753)
* test: add unit tests for RandStringBytesMask and GetRunID in stringutils

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* fixing gofmt

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
2025-07-24 11:55:26 +05:30
Sami S. 07de11eeee
Fix: handle pagination in ssm describeInstanceInformation & API Rate Limit (#738)
* Fix: handle pagination in ssm describe

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* implement exponential backoff with jitter for API rate limiting

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Refactor

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Update pkg/cloud/aws/ssm/ssm-operations.go

Co-authored-by: Neelanjan Manna <neelanjanmanna@gmail.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fixup

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Update pkg/cloud/aws/ssm/ssm-operations.go

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Fix: include error message from stderr if container-kill fails (#740) (#741)

Signed-off-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fix(logs): Fix the error logs for container-kill fault (#745)

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fix(container-kill): Fixed the container stop command timeout issue (#747)

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* feat: Add a rds-instance-stop chaos fault (#710)

* feat: Add a rds-instance-stop chaos fault

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>

---------

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Update pkg/cloud/aws/ssm/ssm-operations.go

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fix go fmt ./...

Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Filter instances on api call

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fixes lint

Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>

---------

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>
Signed-off-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>
Co-authored-by: Neelanjan Manna <neelanjanmanna@gmail.com>
Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
Co-authored-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Co-authored-by: Jongwoo Han <jongwooo.han@gmail.com>
Co-authored-by: Udit Gaurav <udit.gaurav@harness.io>
2025-04-30 10:25:10 +05:30
Jongwoo Han 5c22472290
feat: Add a rds-instance-stop chaos fault (#710)
* feat: Add a rds-instance-stop chaos fault

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>

---------

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
2025-04-24 12:54:05 +05:30
Shubham Chaudhary e7b3fb6f9f
fix(container-kill): Fixed the container stop command timeout issue (#747)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-04-15 18:20:23 +05:30
Shubham Chaudhary e1eaea9110
fix(logs): Fix the error logs for container-kill fault (#745)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-04-03 15:35:00 +05:30
Björn Kylberg 491dc5e31a
Fix: include error message from stderr if container-kill fails (#740) (#741)
Signed-off-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
2025-04-03 14:44:05 +05:30
Shubham Chaudhary caae228e35
(chore): fix the go fmt of the files (#734)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-01-17 12:08:34 +05:30
kbfu 34a62d87f3
fix the cgroup 2 problem (#677)
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-01-17 11:29:30 +05:30
Suhyen Im 8246ff891b
feat: propagate trace context to helper pods (#722)
Signed-off-by: Suhyen Im <suhyenim.kor@gmail.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Co-authored-by: Saranya Jena <saranya.jena@harness.io>
2025-01-15 16:34:19 +05:30
Namkyu Park 9b29558585
feat: export k6 results output to the OTEL collector (#726)
* Export k6 results to the otel collector

Signed-off-by: namkyu1999 <lak9348@gmail.com>

* add envs for multiple projects

Signed-off-by: namkyu1999 <lak9348@gmail.com>

---------

Signed-off-by: namkyu1999 <lak9348@gmail.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Co-authored-by: Saranya Jena <saranya.jena@harness.io>
2025-01-15 16:33:43 +05:30
Sayan Mondal c7ab5a3d7c
Merge pull request #732 from heysujal/add-openssh-clients
add openssh-clients to dockerfile
2025-01-15 11:28:17 +05:30
Shubham Chaudhary 3bef3ad67e
Merge branch 'master' into add-openssh-clients 2025-01-15 10:57:02 +05:30
Sujal Gupta b2f68a6ad1
use revertErr instead of err (#730)
Signed-off-by: Sujal Gupta <sujalgupta6100@gmail.com>
2025-01-15 10:38:32 +05:30
Sujal Gupta cd2ec26083 add openssh-clients to dockerfile
Signed-off-by: Sujal Gupta <sujalgupta6100@gmail.com>
2025-01-06 01:04:25 +05:30
Shubham Chaudhary 7e08c69750
chore(stress): Fix the stress faults (#723)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-11-20 15:18:59 +05:30
Namkyu Park 3ef23b01f9
feat: implement opentelemetry for distributed tracing (#706)
* feat: add otel & tracing for distributed tracing

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* feat: add tracing codes to chaslib

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: misc

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: make otel optional

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: skip if litmus-go not received trace_parent

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: Set context.Context as a parameter in each function

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* update templates

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* feat: rename spans and enhance coverage

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: avoid shadowing

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: add logs

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: add logs

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: fix templates

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

---------

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>
2024-10-24 16:14:57 +05:30
Shubham Chaudhary 0cd6c6fae3
(chore): Fix the build, push, and release pipelines (#716)
* (chore): Fix the build, push, and release pipelines

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* (chore): Fix the dockerfile

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-10-15 23:33:54 +05:30
Shubham Chaudhary 6a386d1410
(chore): Fix the disk-fill fault (#715)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-10-15 22:15:14 +05:30
Vedant Shrotria fc646d678c
Merge pull request #707 from dusdjhyeon/ubi-migration
UBI migration of Images - go-runner
2024-08-23 11:32:44 +05:30
dusdjhyeon 6257c1abb8
feat: add build arg
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-22 16:13:18 +09:00
dusdjhyeon 755a562efe
Merge branch 'ubi-migration' of https://github.com/dusdjhyeon/litmus-go into ubi-migration 2024-08-22 16:10:37 +09:00
dusdjhyeon d0814df9ea
fix: set build args
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-22 16:09:40 +09:00
Vedant Shrotria a6012039fd
Update .github/workflows/run-e2e-on-pr-commits.yml 2024-08-22 11:19:42 +05:30
Vedant Shrotria a1f602ba98
Update .github/workflows/run-e2e-on-pr-commits.yml 2024-08-22 11:19:33 +05:30
Vedant Shrotria 7476994a36
Update .github/workflows/run-e2e-on-pr-commits.yml 2024-08-22 11:19:25 +05:30
Vedant Shrotria 3440fb84eb
Update .github/workflows/release.yml 2024-08-22 11:18:46 +05:30
Vedant Shrotria 652e6b8465
Update .github/workflows/release.yml 2024-08-22 11:18:39 +05:30
Vedant Shrotria 996f3b3f5f
Update .github/workflows/push.yml 2024-08-22 11:18:10 +05:30
Vedant Shrotria e73f3bfb21
Update .github/workflows/push.yml 2024-08-22 11:17:54 +05:30
Vedant Shrotria 054d091dce
Update .github/workflows/build.yml 2024-08-22 11:17:37 +05:30
Vedant Shrotria c362119e05
Update .github/workflows/build.yml 2024-08-22 11:17:15 +05:30
dusdjhyeon 31bf293140
fix: change go version and others
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-22 14:39:17 +09:00
Vedant Shrotria 9569c8b2f4
Merge branch 'master' into ubi-migration 2024-08-21 16:25:14 +05:30
dusdjhyeon 4f9f4e0540
fix: upgrade version for vulnerability
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:58 +09:00
dusdjhyeon 399ccd68a0
fix: change kubectl crictl latest version
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:58 +09:00
Jongwoo Han 35958eae38
Rename env to EC2_INSTANCE_TAG (#708)
Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon 003a3dc02c
fix: change docker repo
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon d4eed32a6d
fix: change version arg
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon af7322bece
fix: app_dir and yum
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon bd853f6e25
feat: migration base image
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
dusdjhyeon cfdb205ca3
fix: typos and add arg
Signed-off-by: dusdjhyeon <dusdj0813@gmail.com>
2024-08-21 10:16:57 +09:00
Jongwoo Han f051d5ac7c
Rename env to EC2_INSTANCE_TAG (#708)
Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
2024-08-14 16:42:35 +05:30
Andrii Kotelnikov 10e9b774a8
Update workloads.go (#705)
Fix issue with empty kind field
Signed-off-by: Andrii Kotelnikov <andrusha@ukr.net>
2024-06-14 14:16:47 +05:30
Vedant Shrotria 9689f74fce
Merge pull request #701 from Jonsy13/add-gitleaks
Adding `gitleaks` as PR Check
2024-05-20 10:27:09 +05:30
Vedant Shrotria d273ba628b
Merge branch 'master' into add-gitleaks 2024-05-17 17:37:15 +05:30
Jonsy13 2315eaf2a4
Added gitleaks
Signed-off-by: Jonsy13 <vedant.shrotria@harness.io>
2024-05-17 17:34:36 +05:30
Shubham Chaudhary f2b2c2747a
chore(io-stress): Fix the pod-io-stress experiment (#700)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-05-17 16:43:19 +05:30
Udit Gaurav 66d01011bb
Fix pipeline issues (#694)
Fix pipeline issues

---------

Signed-off-by: Udit Gaurav <udit.gaurav@harness.io
2024-04-26 14:17:01 +05:30
Udit Gaurav a440615a51
Fix gofmt issues (#695) 2024-04-25 23:45:59 +05:30
Shubham Chaudhary 78eec36b79
chore(probe): Fix the probe description on failure (#692)
* chore(probe): Fix the probe description on failure

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* chore(probe): Consider http timeout as probe failure

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-04-23 18:06:48 +05:30
Michael Morris b5a24b4044
enable ALL for TARGET_CONTAINER (#683)
Signed-off-by: MichaelMorris <michael.morris@est.tech>
2024-03-14 19:44:18 +05:30
Shubham Chaudhary 6d26c21506
test: Adding fuzz testing for common util (#691)
* test: Adding fuzz testing for common util

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* fix the random interval test

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-03-12 17:02:01 +05:30
Namkyu Park 5554a29ea2
chore: fix typos (#690)
Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>
2024-03-11 20:26:50 +05:30
Sayan Mondal 5f0d882912
test: Adding fuzz testing for common util (#688) 2024-03-08 21:42:20 +05:30
Namkyu Park eef3b4021d
feat: Add a k6-loadgen chaos fault (#687)
* feat: add k6-loadgen

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>
2024-03-07 19:19:51 +05:30
smit thakkar 96f6571e77
fix: accomodate for pending pods with no IP address in network fault (#684)
Signed-off-by: smit thakkar <smit.thakkar@deliveryhero.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2024-03-01 15:06:07 +05:30
Nageshbansal b9f897be21
Adds support for tolerations in source cmd probe (#681)
Signed-off-by: nagesh bansal <nageshbansal59@gmail.com>
2024-03-01 14:51:55 +05:30
Michael Morris c2f8f79ab9
Fix consider appKind when filtering target pods (#680)
* Fix consider appKind when filtering target pods

Signed-off-by: MichaelMorris <michael.morris@est.tech>

* Implemted review comment

Signed-off-by: MichaelMorris <michael.morris@est.tech>

---------

Signed-off-by: MichaelMorris <michael.morris@est.tech>
2024-03-01 14:41:29 +05:30
Nageshbansal 69927489d2
Fixes Probe logging for all iterations (#676)
* Fixes Probe logging for all iterations

Signed-off-by: nagesh bansal <nageshbansal59@gmail.com>
2024-01-11 17:48:26 +05:30
Shubham Chaudhary bdddd0d803
Add port blacklisting in the pod-network faults (#673)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-10-12 19:37:56 +05:30
Shubham Chaudhary 1b75f78632
fix(action): Fix the github release action (#672)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-09-29 16:02:01 +05:30
Calvinaud b710216113
Revert chaos when error during drain for node-drain experiments (#668)
- Added an call to uncordonNode in case of an error of the drainNode function

Signed-off-by: Calvin Audier <calvin.audier@gmail.com>
2023-09-21 23:54:33 +05:30
Shubham Chaudhary 392ea29800
chore(network): fix the destination ips for network experiment for service mesh (#666)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-09-15 11:00:34 +05:30
Shubham Chaudhary db13d05e28
Add fix to remove the job labels from helper pod (#665)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-07-24 13:09:57 +05:30
Vedant Shrotria d737281985
Merge pull request #661 from Jonsy13/group-optional-litmus-go
Upgrading chaos-operator version for making group optional in k8s probe
2023-06-05 13:05:51 +05:30
Jonsy13 61751a9404
Added changes for operator upgrade
Signed-off-by: Jonsy13 <vedant.shrotria@harness.io>
2023-06-05 12:34:12 +05:30
Shubham Chaudhary d4f9826ea9
chore(fields): Updating optional fields to pointer type (#658)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-25 14:02:22 +05:30
Shubham Chaudhary 3ab28a5110
run workflow on dispatch event and use token from secrets (#657)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 01:10:08 +05:30
Shubham Chaudhary 3005d02c24
use the official snyk action (#656)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 01:01:09 +05:30
Shubham Chaudhary 1971b8093b
fix the snyk token name (#655)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 00:35:26 +05:30
Shubham Chaudhary e5a831f713
fix the github workflow (#654)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 00:29:54 +05:30
Shubham Chaudhary 95c9602019
adding security scan workflow (#653)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 00:24:53 +05:30
Shubham Chaudhary f36b0761aa
adding security scan workflow (#652)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-20 00:21:19 +05:30
Shubham Chaudhary d3b760d76d
chore(unit): Adding units to the duration fields (#650)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-18 13:40:10 +05:30
Shubham Chaudhary 0bbe8e23e7
Revert "probe comparator logging for all iterations (#646)" (#649)
This reverts commit 8e0bbbbd5d.
2023-04-18 01:01:48 +05:30
Neelanjan Manna 5ade71c694
chore(probe): Update Probe failure descriptions and error codes (#648)
* adds probe description changes

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2023-04-17 17:24:23 +05:30
Shubham Chaudhary 8e0bbbbd5d
probe comparator logging for all iterations (#646)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-17 11:24:47 +05:30
Shubham Chaudhary d0b36e9a50
fix(probe): ProbeSuccessPercentage should not be 100% if experiment terminated with Error (#645)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-04-10 15:17:51 +05:30
Shubham Chaudhary eee4421c3c
chore(sdk): Updating the sdk to latest experiment schema (#644)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-03-20 17:01:46 +05:30
Neelanjan Manna a1c85ca52c
chore(experiments): Replaces default container runtime to containerd (#640)
* replaces default container runtime to containerd

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2023-03-14 19:41:02 +05:30
Shubham Chaudhary f8b370e6f4
add the experiment phase as completed with error (#642)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-03-09 21:52:17 +05:30
Neelanjan Manna 04c031a281
updates http probe wait duration to ms (#643)
Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2023-03-08 12:46:21 +05:30
Shubham Chaudhary ea2b83e1a0
adding backend compatibility to probe retry (#639)
* adding backend compatibility to probe retry

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* updating the chaos-operator version

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-02-22 10:03:56 +05:30
Shubham Chaudhary 291ae4a6ad
chore(error-verdict): Adding experiment verdict as error (#637)
* chore(error-verdict): Adding experiment verdict as error

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* updating error verdict

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* updating the chaos-operator version

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* adding comments and changing function name

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

---------

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-02-21 23:37:56 +05:30
Akash Shrivastava 8b68c4b5cb
Added filtering vm instance by tag (#635)
Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
2023-02-15 16:48:47 +05:30
Shubham Chaudhary 7bdb18016f
chore(probe): updating retries to attempt and use the timout for per attempt timeout (#636)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-02-09 17:02:31 +05:30
Shubham Chaudhary 4aa778ef9c
chore(probe-timeout): converting probe timeout in milli seconds (#634)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-02-05 01:34:39 +05:30
Shubham Chaudhary 1f02800c23
chore(parallel): add support to create unique runid for same timestamp (#633)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-01-20 11:11:12 +05:30
Shubham Chaudhary 2134933c03
fix(stderr): adding the fix for cmd.Exec considers log.info as stderr (#632)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-01-10 21:58:02 +05:30
Shubham Chaudhary d151c8f1e0
chore(sidecar): adding sidecar to the helper pod (#630)
* chore(sidecar): adding sidecar to the helper pod

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* adding support for multiple sidecars

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* chore(sidecar): adding env and envFrom fields

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-01-10 12:58:57 +05:30
Shubham Chaudhary 3622f505c9
chore(probe): Adding the root cause into probe description (#628)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2023-01-09 15:15:14 +05:30
Shubham Chaudhary dc9283614b
chore(sdk): adding failstep and lib changes to sdk (#627)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-12-16 00:36:10 +05:30
Shubham Chaudhary 5eed28bf3f
fix(vulrn):fixing the security vulnerabilities (#617)
* fix(vulrn): fixing the security vulnerabilities

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-12-15 17:22:13 +05:30
Shubham Chaudhary 77b30e221e
(chore): Adding user-friendly failsteps and removing non-litmus libs (#626)
* feat(failstep):  Adding failstep in all experiment and removed non-litmus libs

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-12-15 16:42:27 +05:30
Neelanjan Manna eb98d50855
fix(gcp-label-experiments): Fix label filtering logic (#593)
* fix(gcp-label-experiments): fix label filter logic

Signed-off-by: Neelanjan Manna <neelanjan.manna@harness.io>
2022-11-24 19:27:46 +05:30
Akash Shrivastava 3e72bb14e9
changed dd to use nsenter (#605)
Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>

Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-24 11:02:36 +05:30
Shubham Chaudhary 115ec45339
fix(pod-delete): fixing pod-delete experiment and refactor workload utils (#610)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-22 17:29:33 +05:30
Shubham Chaudhary 0e18911da6
chore(spring-boot): add spring-boot all faults option and remove duplicate code (#609)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-21 23:39:32 +05:30
Shubham Chaudhary e1eb389edf
Adding single helper and selectors changes to master (#608)
* feat(helper): adding single helper per node


Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-21 22:58:46 +05:30
Akash Shrivastava 39bbdbbf44
assigned msg var (#606)
Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>

Signed-off-by: Akash Shrivastava <akash.shrivastava@harness.io>
2022-11-18 14:14:57 +05:30
Shubham Chaudhary ff285178d5
chore(spring-boot): simplifying spring boot experiments env (#604)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-18 11:34:41 +05:30
Soumya Ghosh Dastidar f16249f802
feat: add resource name filtering in k8s probe (#598)
* feat: add resource name filtering in k8s probe

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>
2022-11-14 12:49:55 +05:30
Shubham Chaudhary 21969543bf
chore(spring-boot): spliting spring-boot-chaos experiment to separate experiments (#594)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-14 11:30:41 +05:30
Shubham Chaudhary 7140565204
chore(sudo): fixing sudo command (#595)
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2022-11-07 21:03:09 +05:30
342 changed files with 12982 additions and 11575 deletions

View File

@ -12,19 +12,12 @@ jobs:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: 1.17
go-version: '1.20'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
#TODO: Add Dockerfile linting
# Running go-lint
- name: Checking Go-Lint
run : |
sudo apt-get update && sudo apt-get install golint
make gotasks
- name: gofmt check
run: |
if [ "$(gofmt -s -l . | wc -l)" -ne 0 ]
@ -33,25 +26,21 @@ jobs:
gofmt -s -l .
exit 1
fi
- name: golangci-lint
uses: reviewdog/action-golangci-lint@v1
security:
container:
image: litmuschaos/snyk:1.0
volumes:
- /home/runner/work/_actions/:/home/runner/work/_actions/
- name: golangci-lint
uses: reviewdog/action-golangci-lint@v1
gitleaks-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: snyk/actions/setup@master
- run: snyk auth ${SNYK_TOKEN}
- uses: actions/setup-go@v1
- uses: actions/checkout@v3
with:
go-version: '1.17'
- name: Snyk monitor
run: snyk test
fetch-depth: 0
- name: Run GitLeaks
run: |
wget https://github.com/gitleaks/gitleaks/releases/download/v8.18.2/gitleaks_8.18.2_linux_x64.tar.gz && \
tar -zxvf gitleaks_8.18.2_linux_x64.tar.gz && \
sudo mv gitleaks /usr/local/bin && gitleaks detect --source . -v
build:
needs: pre-checks
@ -60,7 +49,7 @@ jobs:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: 1.17
go-version: '1.20'
- uses: actions/checkout@v2
with:
@ -84,6 +73,7 @@ jobs:
file: build/Dockerfile
platforms: linux/amd64,linux/arm64
tags: litmuschaos/go-runner:ci
build-args: LITMUS_VERSION=3.10.0
trivy:
needs: pre-checks
@ -95,8 +85,8 @@ jobs:
- name: Build an image from Dockerfile
run: |
docker build -f build/Dockerfile -t docker.io/litmuschaos/go-runner:${{ github.sha }} . --build-arg TARGETARCH=amd64
docker build -f build/Dockerfile -t docker.io/litmuschaos/go-runner:${{ github.sha }} . --build-arg TARGETARCH=amd64 --build-arg LITMUS_VERSION=3.10.0
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
@ -105,4 +95,4 @@ jobs:
exit-code: '1'
ignore-unfixed: true
vuln-type: 'os,library'
severity: 'CRITICAL,HIGH'
severity: 'CRITICAL,HIGH'

View File

@ -13,16 +13,9 @@ jobs:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: 1.17
go-version: '1.20'
- uses: actions/checkout@v2
#TODO: Add Dockerfile linting
# Running go-lint
- name: Checking Go-Lint
run : |
sudo apt-get update && sudo apt-get install golint
make gotasks
- name: gofmt check
run: |
if [ "$(gofmt -s -l . | wc -l)" -ne 0 ]
@ -31,9 +24,9 @@ jobs:
gofmt -s -l .
exit 1
fi
- name: golangci-lint
uses: reviewdog/action-golangci-lint@v1
uses: reviewdog/action-golangci-lint@v1
push:
needs: pre-checks
@ -43,7 +36,7 @@ jobs:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: 1.17
go-version: '1.20'
- uses: actions/checkout@v2
- name: Set up QEMU
@ -70,3 +63,4 @@ jobs:
file: build/Dockerfile
platforms: linux/amd64,linux/arm64
tags: litmuschaos/go-runner:ci
build-args: LITMUS_VERSION=3.10.0

View File

@ -8,29 +8,21 @@ on:
jobs:
pre-checks:
runs-on: ubuntu-latest
if: ${{ startsWith(github.ref, 'refs/tags/') }}
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: 1.17
go-version: '1.20'
- uses: actions/checkout@v2
#TODO: Add Dockerfile linting
# Running go-lint
- name: Checking Go-Lint
run : |
sudo apt-get update && sudo apt-get install golint
make gotasks
push:
needs: pre-checks
if: ${{ startsWith(github.ref, 'refs/tags/') }}
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: 1.17
go-version: '1.20'
- uses: actions/checkout@v2
- name: Set Tag
@ -43,7 +35,7 @@ jobs:
run: |
echo "RELEASE TAG: ${RELEASE_TAG}"
echo "${RELEASE_TAG}" > ${{ github.workspace }}/tag.txt
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
with:
@ -63,10 +55,11 @@ jobs:
- name: Build and push
uses: docker/build-push-action@v2
env:
env:
RELEASE_TAG: ${{ env.RELEASE_TAG }}
with:
push: true
file: build/Dockerfile
platforms: linux/amd64,linux/arm64
tags: litmuschaos/go-runner:${{ env.RELEASE_TAG }},litmuschaos/go-runner:latest
build-args: LITMUS_VERSION=3.10.0

View File

@ -9,215 +9,15 @@ on:
- '**.yaml'
jobs:
# Helm_Install_Generic_Tests:
# runs-on: ubuntu-18.04
# steps:
# - uses: actions/checkout@v2
# with:
# ref: ${{ github.event.pull_request.head.sha }}
# - name: Generate go binary and build docker image
# run: make build-amd64
# #Install and configure a kind cluster
# - name: Installing KinD cluster for the test
# uses: engineerd/setup-kind@v0.5.0
# with:
# version: "v0.7.0"
# config: "build/kind-cluster/kind-config.yaml"
# - name: Configuring and testing the Installation
# run: |
# kubectl taint nodes kind-control-plane node-role.kubernetes.io/master-
# kind get kubeconfig --internal >$HOME/.kube/config
# kubectl cluster-info --context kind-kind
# kubectl get nodes
# - name: Load docker image
# run: /usr/local/bin/kind load docker-image litmuschaos/go-runner:ci
# - name: Deploy a sample application for chaos injection
# run: |
# kubectl apply -f https://raw.githubusercontent.com/litmuschaos/chaos-ci-lib/master/app/nginx.yml
# kubectl wait --for=condition=Ready pods --all --namespace default --timeout=90s
# - name: Setting up kubeconfig ENV for Github Chaos Action
# run: echo ::set-env name=KUBE_CONFIG_DATA::$(base64 -w 0 ~/.kube/config)
# env:
# ACTIONS_ALLOW_UNSECURE_COMMANDS: true
# - name: Setup Litmus
# uses: litmuschaos/github-chaos-actions@master
# env:
# INSTALL_LITMUS: true
# - name: Running Litmus pod delete chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-delete
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# - name: Running container kill chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: container-kill
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# CONTAINER_RUNTIME: containerd
# - name: Running node-cpu-hog chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: node-cpu-hog
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# - name: Running node-memory-hog chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: node-memory-hog
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# - name: Running pod-cpu-hog chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-cpu-hog
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# CPU_CORES: 1
# - name: Running pod-memory-hog chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-memory-hog
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# MEMORY_CONSUMPTION: 500
# - name: Running pod network corruption chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-network-corruption
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# NETWORK_INTERFACE: eth0
# CONTAINER_RUNTIME: containerd
# - name: Running pod network duplication chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-network-duplication
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# NETWORK_INTERFACE: eth0
# CONTAINER_RUNTIME: containerd
# - name: Running pod-network-latency chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-network-latency
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# NETWORK_INTERFACE: eth0
# NETWORK_LATENCY: 60000
# CONTAINER_RUNTIME: containerd
# - name: Running pod-network-loss chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-network-loss
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TARGET_CONTAINER: nginx
# TOTAL_CHAOS_DURATION: 60
# NETWORK_INTERFACE: eth0
# NETWORK_PACKET_LOSS_PERCENTAGE: 100
# CONTAINER_RUNTIME: containerd
# - name: Running pod autoscaler chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: pod-autoscaler
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TOTAL_CHAOS_DURATION: 60
# - name: Running node-io-stress chaos experiment
# if: always()
# uses: litmuschaos/github-chaos-actions@master
# env:
# EXPERIMENT_NAME: node-io-stress
# EXPERIMENT_IMAGE: litmuschaos/go-runner
# EXPERIMENT_IMAGE_TAG: ci
# IMAGE_PULL_POLICY: IfNotPresent
# JOB_CLEANUP_POLICY: delete
# TOTAL_CHAOS_DURATION: 120
# FILESYSTEM_UTILIZATION_PERCENTAGE: 10
# - name: Uninstall Litmus
# uses: litmuschaos/github-chaos-actions@master
# env:
# LITMUS_CLEANUP: true
# - name: Deleting KinD cluster
# if: always()
# run: kind delete cluster
Pod_Level_In_Serial_Mode:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
- uses: actions/setup-go@v5
with:
go-version: '1.17'
go-version: '1.20'
- uses: actions/checkout@v2
with:
@ -226,94 +26,16 @@ jobs:
- name: Generating Go binary and Building docker image
run: |
make build-amd64
#Install and configure a kind cluster
- name: Installing Prerequisites (K3S Cluster)
env:
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
run: |
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.21.11+k3s1 sh -s - --docker --write-kubeconfig-mode 664
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- uses: actions/checkout@v2
with:
repository: 'litmuschaos/litmus-e2e'
ref: 'master'
- name: Running Pod level experiment with affected percentage 100 and in series mode
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
run: |
make build-litmus
make app-deploy
make pod-affected-perc-ton-series
- name: Deleting K3S cluster
if: always()
run: /usr/local/bin/k3s-uninstall.sh
Pod_Level_In_Parallel_Mode:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '1.17'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Generating Go binary and Building docker image
- name: Install KinD
run: |
make build-amd64
#Install and configure a kind cluster
- name: Installing Prerequisites (K3S Cluster)
env:
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
run: |
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.21.11+k3s1 sh -s - --docker --write-kubeconfig-mode 664
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- uses: actions/checkout@v2
with:
repository: 'litmuschaos/litmus-e2e'
ref: 'master'
- name: Running Pod level experiment with affected percentage 100 and in parallel mode
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /etc/rancher/k3s/k3s.yaml
run: |
make build-litmus
make app-deploy
make pod-affected-perc-ton-parallel
- name: Deleting K3S cluster
if: always()
run: /usr/local/bin/k3s-uninstall.sh
Node_Level_Tests:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v2
with:
go-version: '1.17'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Generating Go binary and Building docker image
run: |
make build-amd64
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
mv ./kind /usr/local/bin/kind
- name: Create KinD Cluster
run: kind create cluster --config build/kind-cluster/kind-config.yaml
run: |
kind create cluster --config build/kind-cluster/kind-config.yaml
- name: Configuring and testing the Installation
run: |
@ -324,7 +46,123 @@ jobs:
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
kind load docker-image --name=kind litmuschaos/go-runner:ci
- uses: actions/checkout@v2
with:
repository: 'litmuschaos/litmus-e2e'
ref: 'master'
- name: Running Pod level experiment with affected percentage 100 and in series mode
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /home/runner/.kube/config
run: |
make build-litmus
make app-deploy
make pod-affected-perc-ton-series
- name: Deleting KinD cluster
if: always()
run: kind delete cluster
Pod_Level_In_Parallel_Mode:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v5
with:
go-version: '1.20'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Generating Go binary and Building docker image
run: |
make build-amd64
- name: Install KinD
run: |
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
mv ./kind /usr/local/bin/kind
- name: Create KinD Cluster
run: |
kind create cluster --config build/kind-cluster/kind-config.yaml
- name: Configuring and testing the Installation
env:
KUBECONFIG: /home/runner/.kube/config
run: |
kubectl taint nodes kind-control-plane node-role.kubernetes.io/control-plane-
kubectl cluster-info --context kind-kind
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
- uses: actions/checkout@v2
with:
repository: 'litmuschaos/litmus-e2e'
ref: 'master'
- name: Running Pod level experiment with affected percentage 100 and in parallel mode
env:
GO_EXPERIMENT_IMAGE: litmuschaos/go-runner:ci
EXPERIMENT_IMAGE_PULL_POLICY: IfNotPresent
KUBECONFIG: /home/runner/.kube/config
run: |
make build-litmus
make app-deploy
make pod-affected-perc-ton-parallel
- name: Deleting KinD cluster
if: always()
run: kind delete cluster
Node_Level_Tests:
runs-on: ubuntu-latest
steps:
# Install golang
- uses: actions/setup-go@v5
with:
go-version: '1.20'
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Generating Go binary and Building docker image
run: |
make build-amd64
- name: Install KinD
run: |
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
mv ./kind /usr/local/bin/kind
- name: Create KinD Cluster
run: |
kind create cluster --config build/kind-cluster/kind-config.yaml
- name: Configuring and testing the Installation
run: |
kubectl taint nodes kind-control-plane node-role.kubernetes.io/control-plane-
kubectl cluster-info --context kind-kind
kubectl wait node --all --for condition=ready --timeout=90s
kubectl get nodes
- name: Load image on the nodes of the cluster
run: |
kind load docker-image --name=kind litmuschaos/go-runner:ci
- uses: actions/checkout@v2
with:
@ -355,4 +193,6 @@ jobs:
- name: Deleting KinD cluster
if: always()
run: kind delete cluster
run: |
kubectl get nodes
kind delete cluster

27
.github/workflows/security-scan.yml vendored Normal file
View File

@ -0,0 +1,27 @@
---
name: Security Scan
on:
workflow_dispatch:
jobs:
trivy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Build an image from Dockerfile
run: |
docker build -f build/Dockerfile -t docker.io/litmuschaos/go-runner:${{ github.sha }} . --build-arg TARGETARCH=amd64 --build-arg LITMUS_VERSION=3.9.0
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: 'docker.io/litmuschaos/go-runner:${{ github.sha }}'
format: 'table'
exit-code: '1'
ignore-unfixed: true
vuln-type: 'os,library'
severity: 'CRITICAL,HIGH'

View File

@ -31,7 +31,7 @@ deps: _build_check_docker
_build_check_docker:
@echo "------------------"
@echo "--> Check the Docker deps"
@echo "--> Check the Docker deps"
@echo "------------------"
@if [ $(IS_DOCKER_INSTALLED) -eq 1 ]; \
then echo "" \
@ -56,7 +56,7 @@ unused-package-check:
.PHONY: docker.buildx
docker.buildx:
@echo "------------------------------"
@echo "--> Setting up Builder "
@echo "--> Setting up Builder "
@echo "------------------------------"
@if ! docker buildx ls | grep -q multibuilder; then\
docker buildx create --name multibuilder;\
@ -69,27 +69,27 @@ push: docker.buildx image-push
image-push:
@echo "------------------------"
@echo "--> Push go-runner image"
@echo "--> Push go-runner image"
@echo "------------------------"
@echo "Pushing $(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)"
@docker buildx build . --push --file build/Dockerfile --progress plane --platform linux/arm64,linux/amd64 --no-cache --tag $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
@docker buildx build . --push --file build/Dockerfile --progress plain --platform linux/arm64,linux/amd64 --no-cache --tag $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
.PHONY: build-amd64
build-amd64:
@echo "-------------------------"
@echo "--> Build go-runner image"
@echo "--> Build go-runner image"
@echo "-------------------------"
@sudo docker build --file build/Dockerfile --tag $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG) . --build-arg TARGETARCH=amd64
@sudo docker build --file build/Dockerfile --tag $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG) . --build-arg TARGETARCH=amd64 --build-arg LITMUS_VERSION=3.9.0
.PHONY: push-amd64
push-amd64:
@echo "------------------------------"
@echo "--> Pushing image"
@echo "--> Pushing image"
@echo "------------------------------"
@sudo docker push $(DOCKER_REGISTRY)/$(DOCKER_REPO)/$(DOCKER_IMAGE):$(DOCKER_TAG)
.PHONY: trivy-check
trivy-check:

View File

@ -1,7 +1,11 @@
package main
import (
"context"
"errors"
"flag"
"os"
// Uncomment to load all auth plugins
// _ "k8s.io/client-go/plugin/pkg/client/auth"
@ -11,6 +15,8 @@ import (
// _ "k8s.io/client-go/plugin/pkg/client/auth/oidc"
// _ "k8s.io/client-go/plugin/pkg/client/auth/openstack"
"go.opentelemetry.io/otel"
awsSSMChaosByID "github.com/litmuschaos/litmus-go/experiments/aws-ssm/aws-ssm-chaos-by-id/experiment"
awsSSMChaosByTag "github.com/litmuschaos/litmus-go/experiments/aws-ssm/aws-ssm-chaos-by-tag/experiment"
azureDiskLoss "github.com/litmuschaos/litmus-go/experiments/azure/azure-disk-loss/experiment"
@ -51,16 +57,19 @@ import (
podNetworkLatency "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-latency/experiment"
podNetworkLoss "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-loss/experiment"
podNetworkPartition "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-partition/experiment"
podNetworkRateLimit "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-rate-limit/experiment"
kafkaBrokerPodFailure "github.com/litmuschaos/litmus-go/experiments/kafka/kafka-broker-pod-failure/experiment"
ebsLossByID "github.com/litmuschaos/litmus-go/experiments/kube-aws/ebs-loss-by-id/experiment"
ebsLossByTag "github.com/litmuschaos/litmus-go/experiments/kube-aws/ebs-loss-by-tag/experiment"
ec2TerminateByID "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate-by-id/experiment"
ec2TerminateByTag "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate-by-tag/experiment"
springBootChaos "github.com/litmuschaos/litmus-go/experiments/spring-boot/spring-boot-chaos/experiment"
rdsInstanceStop "github.com/litmuschaos/litmus-go/experiments/kube-aws/rds-instance-stop/experiment"
k6Loadgen "github.com/litmuschaos/litmus-go/experiments/load/k6-loadgen/experiment"
springBootFaults "github.com/litmuschaos/litmus-go/experiments/spring-boot/spring-boot-faults/experiment"
vmpoweroff "github.com/litmuschaos/litmus-go/experiments/vmware/vm-poweroff/experiment"
"github.com/litmuschaos/litmus-go/pkg/clients"
cli "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/sirupsen/logrus"
)
@ -74,8 +83,25 @@ func init() {
}
func main() {
initCtx := context.Background()
clients := clients.ClientSets{}
// Set up Observability.
if otelExporterEndpoint := os.Getenv(telemetry.OTELExporterOTLPEndpoint); otelExporterEndpoint != "" {
shutdown, err := telemetry.InitOTelSDK(initCtx, true, otelExporterEndpoint)
if err != nil {
log.Errorf("Failed to initialize OTel SDK: %v", err)
return
}
defer func() {
err = errors.Join(err, shutdown(initCtx))
}()
initCtx = telemetry.GetTraceParentContext()
}
clients := cli.ClientSets{}
ctx, span := otel.Tracer(telemetry.TracerName).Start(initCtx, "ExecuteExperiment")
defer span.End()
// parse the experiment name
experimentName := flag.String("name", "pod-delete", "name of the chaos experiment")
@ -88,102 +114,108 @@ func main() {
log.Infof("Experiment Name: %v", *experimentName)
// invoke the corresponding experiment based on the the (-name) flag
// invoke the corresponding experiment based on the (-name) flag
switch *experimentName {
case "container-kill":
containerKill.ContainerKill(clients)
containerKill.ContainerKill(ctx, clients)
case "disk-fill":
diskFill.DiskFill(clients)
diskFill.DiskFill(ctx, clients)
case "kafka-broker-pod-failure":
kafkaBrokerPodFailure.KafkaBrokerPodFailure(clients)
kafkaBrokerPodFailure.KafkaBrokerPodFailure(ctx, clients)
case "kubelet-service-kill":
kubeletServiceKill.KubeletServiceKill(clients)
kubeletServiceKill.KubeletServiceKill(ctx, clients)
case "docker-service-kill":
dockerServiceKill.DockerServiceKill(clients)
dockerServiceKill.DockerServiceKill(ctx, clients)
case "node-cpu-hog":
nodeCPUHog.NodeCPUHog(clients)
nodeCPUHog.NodeCPUHog(ctx, clients)
case "node-drain":
nodeDrain.NodeDrain(clients)
nodeDrain.NodeDrain(ctx, clients)
case "node-io-stress":
nodeIOStress.NodeIOStress(clients)
nodeIOStress.NodeIOStress(ctx, clients)
case "node-memory-hog":
nodeMemoryHog.NodeMemoryHog(clients)
nodeMemoryHog.NodeMemoryHog(ctx, clients)
case "node-taint":
nodeTaint.NodeTaint(clients)
nodeTaint.NodeTaint(ctx, clients)
case "pod-autoscaler":
podAutoscaler.PodAutoscaler(clients)
podAutoscaler.PodAutoscaler(ctx, clients)
case "pod-cpu-hog-exec":
podCPUHogExec.PodCPUHogExec(clients)
podCPUHogExec.PodCPUHogExec(ctx, clients)
case "pod-delete":
podDelete.PodDelete(clients)
podDelete.PodDelete(ctx, clients)
case "pod-io-stress":
podIOStress.PodIOStress(clients)
podIOStress.PodIOStress(ctx, clients)
case "pod-memory-hog-exec":
podMemoryHogExec.PodMemoryHogExec(clients)
podMemoryHogExec.PodMemoryHogExec(ctx, clients)
case "pod-network-corruption":
podNetworkCorruption.PodNetworkCorruption(clients)
podNetworkCorruption.PodNetworkCorruption(ctx, clients)
case "pod-network-duplication":
podNetworkDuplication.PodNetworkDuplication(clients)
podNetworkDuplication.PodNetworkDuplication(ctx, clients)
case "pod-network-latency":
podNetworkLatency.PodNetworkLatency(clients)
podNetworkLatency.PodNetworkLatency(ctx, clients)
case "pod-network-loss":
podNetworkLoss.PodNetworkLoss(clients)
podNetworkLoss.PodNetworkLoss(ctx, clients)
case "pod-network-partition":
podNetworkPartition.PodNetworkPartition(clients)
podNetworkPartition.PodNetworkPartition(ctx, clients)
case "pod-network-rate-limit":
podNetworkRateLimit.PodNetworkRateLimit(ctx, clients)
case "pod-memory-hog":
podMemoryHog.PodMemoryHog(clients)
podMemoryHog.PodMemoryHog(ctx, clients)
case "pod-cpu-hog":
podCPUHog.PodCPUHog(clients)
podCPUHog.PodCPUHog(ctx, clients)
case "cassandra-pod-delete":
cassandraPodDelete.CasssandraPodDelete(clients)
cassandraPodDelete.CasssandraPodDelete(ctx, clients)
case "aws-ssm-chaos-by-id":
awsSSMChaosByID.AWSSSMChaosByID(clients)
awsSSMChaosByID.AWSSSMChaosByID(ctx, clients)
case "aws-ssm-chaos-by-tag":
awsSSMChaosByTag.AWSSSMChaosByTag(clients)
awsSSMChaosByTag.AWSSSMChaosByTag(ctx, clients)
case "ec2-terminate-by-id":
ec2TerminateByID.EC2TerminateByID(clients)
ec2TerminateByID.EC2TerminateByID(ctx, clients)
case "ec2-terminate-by-tag":
ec2TerminateByTag.EC2TerminateByTag(clients)
ec2TerminateByTag.EC2TerminateByTag(ctx, clients)
case "ebs-loss-by-id":
ebsLossByID.EBSLossByID(clients)
ebsLossByID.EBSLossByID(ctx, clients)
case "ebs-loss-by-tag":
ebsLossByTag.EBSLossByTag(clients)
ebsLossByTag.EBSLossByTag(ctx, clients)
case "rds-instance-stop":
rdsInstanceStop.RDSInstanceStop(ctx, clients)
case "node-restart":
nodeRestart.NodeRestart(clients)
nodeRestart.NodeRestart(ctx, clients)
case "pod-dns-error":
podDNSError.PodDNSError(clients)
podDNSError.PodDNSError(ctx, clients)
case "pod-dns-spoof":
podDNSSpoof.PodDNSSpoof(clients)
podDNSSpoof.PodDNSSpoof(ctx, clients)
case "pod-http-latency":
podHttpLatency.PodHttpLatency(clients)
podHttpLatency.PodHttpLatency(ctx, clients)
case "pod-http-status-code":
podHttpStatusCode.PodHttpStatusCode(clients)
podHttpStatusCode.PodHttpStatusCode(ctx, clients)
case "pod-http-modify-header":
podHttpModifyHeader.PodHttpModifyHeader(clients)
podHttpModifyHeader.PodHttpModifyHeader(ctx, clients)
case "pod-http-modify-body":
podHttpModifyBody.PodHttpModifyBody(clients)
podHttpModifyBody.PodHttpModifyBody(ctx, clients)
case "pod-http-reset-peer":
podHttpResetPeer.PodHttpResetPeer(clients)
podHttpResetPeer.PodHttpResetPeer(ctx, clients)
case "vm-poweroff":
vmpoweroff.VMPoweroff(clients)
vmpoweroff.VMPoweroff(ctx, clients)
case "azure-instance-stop":
azureInstanceStop.AzureInstanceStop(clients)
azureInstanceStop.AzureInstanceStop(ctx, clients)
case "azure-disk-loss":
azureDiskLoss.AzureDiskLoss(clients)
azureDiskLoss.AzureDiskLoss(ctx, clients)
case "gcp-vm-disk-loss":
gcpVMDiskLoss.VMDiskLoss(clients)
gcpVMDiskLoss.VMDiskLoss(ctx, clients)
case "pod-fio-stress":
podFioStress.PodFioStress(clients)
podFioStress.PodFioStress(ctx, clients)
case "gcp-vm-instance-stop":
gcpVMInstanceStop.VMInstanceStop(clients)
gcpVMInstanceStop.VMInstanceStop(ctx, clients)
case "redfish-node-restart":
redfishNodeRestart.NodeRestart(clients)
redfishNodeRestart.NodeRestart(ctx, clients)
case "gcp-vm-instance-stop-by-label":
gcpVMInstanceStopByLabel.GCPVMInstanceStopByLabel(clients)
gcpVMInstanceStopByLabel.GCPVMInstanceStopByLabel(ctx, clients)
case "gcp-vm-disk-loss-by-label":
gcpVMDiskLossByLabel.GCPVMDiskLossByLabel(clients)
case "spring-boot-chaos":
springBootChaos.Experiment(clients)
gcpVMDiskLossByLabel.GCPVMDiskLossByLabel(ctx, clients)
case "spring-boot-cpu-stress", "spring-boot-memory-stress", "spring-boot-exceptions", "spring-boot-app-kill", "spring-boot-faults", "spring-boot-latency":
springBootFaults.Experiment(ctx, clients, *experimentName)
case "k6-loadgen":
k6Loadgen.Experiment(ctx, clients)
default:
log.Errorf("Unsupported -name %v, please provide the correct value of -name args", *experimentName)
return

View File

@ -1,7 +1,11 @@
package main
import (
"context"
"errors"
"flag"
"os"
// Uncomment to load all auth plugins
// _ "k8s.io/client-go/plugin/pkg/client/auth"
@ -17,10 +21,11 @@ import (
networkChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/helper"
dnsChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/pod-dns-chaos/helper"
stressChaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/stress-chaos/helper"
"github.com/litmuschaos/litmus-go/pkg/clients"
cli "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
func init() {
@ -33,8 +38,24 @@ func init() {
}
func main() {
ctx := context.Background()
// Set up Observability.
if otelExporterEndpoint := os.Getenv(telemetry.OTELExporterOTLPEndpoint); otelExporterEndpoint != "" {
shutdown, err := telemetry.InitOTelSDK(ctx, true, otelExporterEndpoint)
if err != nil {
log.Errorf("Failed to initialize OTel SDK: %v", err)
return
}
defer func() {
err = errors.Join(err, shutdown(ctx))
}()
ctx = telemetry.GetTraceParentContext()
}
clients := clients.ClientSets{}
clients := cli.ClientSets{}
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "ExecuteExperimentHelper")
defer span.End()
// parse the helper name
helperName := flag.String("name", "", "name of the helper pod")
@ -50,17 +71,17 @@ func main() {
// invoke the corresponding helper based on the the (-name) flag
switch *helperName {
case "container-kill":
containerKill.Helper(clients)
containerKill.Helper(ctx, clients)
case "disk-fill":
diskFill.Helper(clients)
diskFill.Helper(ctx, clients)
case "dns-chaos":
dnsChaos.Helper(clients)
dnsChaos.Helper(ctx, clients)
case "stress-chaos":
stressChaos.Helper(clients)
stressChaos.Helper(ctx, clients)
case "network-chaos":
networkChaos.Helper(clients)
networkChaos.Helper(ctx, clients)
case "http-chaos":
httpChaos.Helper(clients)
httpChaos.Helper(ctx, clients)
default:
log.Errorf("Unsupported -name %v, please provide the correct value of -name args", *helperName)

View File

@ -1,6 +1,6 @@
# Multi-stage docker build
# Build stage
FROM golang:1.17 AS builder
FROM golang:1.22 AS builder
ARG TARGETOS=linux
ARG TARGETARCH
@ -14,26 +14,99 @@ RUN export GOOS=${TARGETOS} && \
RUN CGO_ENABLED=0 go build -o /output/experiments ./bin/experiment
RUN CGO_ENABLED=0 go build -o /output/helpers ./bin/helper
FROM alpine:3.15.0 AS dep
# Install generally useful things
RUN apk --update add \
sudo \
iproute2 \
iptables
# Packaging stage
# Image source: https://github.com/litmuschaos/test-tools/blob/master/custom/hardened-alpine/experiment/Dockerfile
# The base image is non-root (have litmus user) with default litmus directory.
FROM litmuschaos/experiment-alpine
FROM registry.access.redhat.com/ubi9/ubi:9.4
LABEL maintainer="LitmusChaos"
COPY --from=builder /output/ /litmus
COPY --from=dep /usr/bin/sudo /usr/bin/
COPY --from=dep /sbin/tc /sbin/
COPY --from=dep /sbin/iptables /sbin/
ARG TARGETARCH
ARG LITMUS_VERSION
#Copying Necessary Files
COPY ./pkg/cloud/aws/common/ssm-docs/LitmusChaos-AWS-SSM-Docs.yml .
# Install generally useful things
RUN yum install -y \
sudo \
sshpass \
procps \
openssh-clients
# tc binary
RUN yum install -y https://dl.rockylinux.org/vault/rocky/9.3/devel/$(uname -m)/os/Packages/i/iproute-6.2.0-5.el9.$(uname -m).rpm
RUN yum install -y https://dl.rockylinux.org/vault/rocky/9.3/devel/$(uname -m)/os/Packages/i/iproute-tc-6.2.0-5.el9.$(uname -m).rpm
# iptables
RUN yum install -y https://dl.rockylinux.org/vault/rocky/9.3/devel/$(uname -m)/os/Packages/i/iptables-libs-1.8.8-6.el9_1.$(uname -m).rpm
RUN yum install -y https://dl.fedoraproject.org/pub/archive/epel/9.3/Everything/$(uname -m)/Packages/i/iptables-legacy-libs-1.8.8-6.el9.2.$(uname -m).rpm
RUN yum install -y https://dl.fedoraproject.org/pub/archive/epel/9.3/Everything/$(uname -m)/Packages/i/iptables-legacy-1.8.8-6.el9.2.$(uname -m).rpm
# stress-ng
RUN yum install -y https://yum.oracle.com/repo/OracleLinux/OL9/appstream/$(uname -m)/getPackage/Judy-1.0.5-28.el9.$(uname -m).rpm
RUN yum install -y https://yum.oracle.com/repo/OracleLinux/OL9/appstream/$(uname -m)/getPackage/stress-ng-0.14.00-2.el9.$(uname -m).rpm
#Installing Kubectl
ENV KUBE_LATEST_VERSION="v1.31.0"
RUN curl -L https://storage.googleapis.com/kubernetes-release/release/${KUBE_LATEST_VERSION}/bin/linux/${TARGETARCH}/kubectl -o /usr/bin/kubectl && \
chmod 755 /usr/bin/kubectl
#Installing crictl binaries
RUN curl -L https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.31.1/crictl-v1.31.1-linux-${TARGETARCH}.tar.gz --output crictl-v1.31.1-linux-${TARGETARCH}.tar.gz && \
tar zxvf crictl-v1.31.1-linux-${TARGETARCH}.tar.gz -C /sbin && \
chmod 755 /sbin/crictl
#Installing promql cli binaries
RUN curl -L https://github.com/chaosnative/promql-cli/releases/download/3.0.0-beta6/promql_linux_${TARGETARCH} --output /usr/bin/promql && chmod 755 /usr/bin/promql
#Installing pause cli binaries
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/pause-linux-${TARGETARCH} --output /usr/bin/pause && chmod 755 /usr/bin/pause
#Installing dns_interceptor cli binaries
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/dns_interceptor --output /sbin/dns_interceptor && chmod 755 /sbin/dns_interceptor
#Installing nsutil cli binaries
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/nsutil-linux-${TARGETARCH} --output /sbin/nsutil && chmod 755 /sbin/nsutil
#Installing nsutil shared lib
RUN curl -L https://github.com/litmuschaos/test-tools/releases/download/${LITMUS_VERSION}/nsutil_${TARGETARCH}.so --output /usr/local/lib/nsutil.so && chmod 755 /usr/local/lib/nsutil.so
# Installing toxiproxy binaries
RUN curl -L https://litmus-http-proxy.s3.amazonaws.com/cli/cli/toxiproxy-cli-linux-${TARGETARCH}.tar.gz --output toxiproxy-cli-linux-${TARGETARCH}.tar.gz && \
tar zxvf toxiproxy-cli-linux-${TARGETARCH}.tar.gz -C /sbin/ && \
chmod 755 /sbin/toxiproxy-cli
RUN curl -L https://litmus-http-proxy.s3.amazonaws.com/server/server/toxiproxy-server-linux-${TARGETARCH}.tar.gz --output toxiproxy-server-linux-${TARGETARCH}.tar.gz && \
tar zxvf toxiproxy-server-linux-${TARGETARCH}.tar.gz -C /sbin/ && \
chmod 755 /sbin/toxiproxy-server
ENV APP_USER=litmus
ENV APP_DIR="/$APP_USER"
ENV DATA_DIR="$APP_DIR/data"
# The USERD_ID of user
ENV APP_USER_ID=2000
RUN useradd -s /bin/true -u $APP_USER_ID -m -d $APP_DIR $APP_USER
# change to 0(root) group because openshift will run container with arbitrary uid as a member of root group
RUN chgrp -R 0 "$APP_DIR" && chmod -R g=u "$APP_DIR"
# Giving sudo to all users (required for almost all experiments)
RUN echo 'ALL ALL=(ALL:ALL) NOPASSWD: ALL' >> /etc/sudoers
WORKDIR $APP_DIR
COPY --from=builder /output/ .
COPY --from=docker:27.0.3 /usr/local/bin/docker /sbin/docker
RUN chmod 755 /sbin/docker
# Set permissions and ownership for the copied binaries
RUN chmod 755 ./experiments ./helpers && \
chown ${APP_USER}:0 ./experiments ./helpers
# Set ownership for binaries in /sbin and /usr/bin
RUN chown ${APP_USER}:0 /sbin/* /usr/bin/* && \
chown root:root /usr/bin/sudo && \
chmod 4755 /usr/bin/sudo
# Copying Necessary Files
COPY ./pkg/cloud/aws/common/ssm-docs/LitmusChaos-AWS-SSM-Docs.yml ./LitmusChaos-AWS-SSM-Docs.yml
RUN chown ${APP_USER}:0 ./LitmusChaos-AWS-SSM-Docs.yml && chmod 755 ./LitmusChaos-AWS-SSM-Docs.yml
USER ${APP_USER}

View File

@ -1,7 +1,6 @@
apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
- role: worker

View File

@ -1,23 +1,28 @@
package lib
import (
"context"
"os"
"strings"
"time"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
// InjectChaosInSerialMode will inject the aws ssm chaos in serial mode that is one after other
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, inject chan os.Signal) error {
func InjectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, inject chan os.Signal) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSSSMFaultInSerialMode")
defer span.End()
select {
case <-inject:
@ -46,7 +51,7 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
ec2IDList := strings.Fields(ec2ID)
commandId, err := ssm.SendSSMCommand(experimentsDetails, ec2IDList)
if err != nil {
return errors.Errorf("fail to send ssm command, err: %v", err)
return stacktrace.Propagate(err, "failed to send ssm command")
}
//prepare commands for abort recovery
experimentsDetails.CommandIDs = append(experimentsDetails.CommandIDs, commandId)
@ -54,21 +59,21 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//wait for the ssm command to get in running state
log.Info("[Wait]: Waiting for the ssm command to get in InProgress state")
if err := ssm.WaitForCommandStatus("InProgress", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return errors.Errorf("fail to start ssm command, err: %v", err)
return stacktrace.Propagate(err, "failed to start ssm command")
}
common.SetTargets(ec2ID, "injected", "EC2", chaosDetails)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
//wait for the ssm command to get succeeded in the given chaos duration
log.Info("[Wait]: Waiting for the ssm command to get completed")
if err := ssm.WaitForCommandStatus("Success", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return errors.Errorf("fail to send ssm command, err: %v", err)
return stacktrace.Propagate(err, "failed to send ssm command")
}
common.SetTargets(ec2ID, "reverted", "EC2", chaosDetails)
@ -85,7 +90,9 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// InjectChaosInParallelMode will inject the aws ssm chaos in parallel mode that is all at once
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, inject chan os.Signal) error {
func InjectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, inject chan os.Signal) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSSSMFaultInParallelMode")
defer span.End()
select {
case <-inject:
@ -110,7 +117,7 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
log.Info("[Chaos]: Starting the ssm command")
commandId, err := ssm.SendSSMCommand(experimentsDetails, instanceIDList)
if err != nil {
return errors.Errorf("fail to send ssm command, err: %v", err)
return stacktrace.Propagate(err, "failed to send ssm command")
}
//prepare commands for abort recovery
experimentsDetails.CommandIDs = append(experimentsDetails.CommandIDs, commandId)
@ -119,14 +126,14 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//wait for the ssm command to get in running state
log.Info("[Wait]: Waiting for the ssm command to get in InProgress state")
if err := ssm.WaitForCommandStatus("InProgress", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return errors.Errorf("fail to start ssm command, err: %v", err)
return stacktrace.Propagate(err, "failed to start ssm command")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -134,7 +141,7 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//wait for the ssm command to get succeeded in the given chaos duration
log.Info("[Wait]: Waiting for the ssm command to get completed")
if err := ssm.WaitForCommandStatus("Success", commandId, ec2ID, experimentsDetails.Region, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.Delay); err != nil {
return errors.Errorf("fail to send ssm command, err: %v", err)
return stacktrace.Propagate(err, "failed to send ssm command")
}
}
@ -159,14 +166,14 @@ func AbortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, abort c
case len(experimentsDetails.CommandIDs) != 0:
for _, commandId := range experimentsDetails.CommandIDs {
if err := ssm.CancelCommand(commandId, experimentsDetails.Region); err != nil {
log.Errorf("[Abort]: fail to cancle command, recovery failed, err: %v", err)
log.Errorf("[Abort]: Failed to cancel command, recovery failed: %v", err)
}
}
default:
log.Info("[Abort]: No command found to cancle")
log.Info("[Abort]: No SSM Command found to cancel")
}
if err := ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region); err != nil {
log.Errorf("fail to delete ssm doc, err: %v", err)
log.Errorf("Failed to delete ssm document: %v", err)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)

View File

@ -1,6 +1,8 @@
package ssm
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
@ -8,12 +10,15 @@ import (
"github.com/litmuschaos/litmus-go/chaoslib/litmus/aws-ssm-chaos/lib"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
@ -21,8 +26,10 @@ var (
inject, abort chan os.Signal
)
//PrepareAWSSSMChaosByID contains the prepration and injection steps for the experiment
func PrepareAWSSSMChaosByID(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareAWSSSMChaosByID contains the prepration and injection steps for the experiment
func PrepareAWSSSMChaosByID(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSSSMFaultByID")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -42,7 +49,7 @@ func PrepareAWSSSMChaosByID(experimentsDetails *experimentTypes.ExperimentDetail
//create and upload the ssm document on the given aws service monitoring docs
if err = ssm.CreateAndUploadDocument(experimentsDetails.DocumentName, experimentsDetails.DocumentType, experimentsDetails.DocumentFormat, experimentsDetails.DocumentPath, experimentsDetails.Region); err != nil {
return errors.Errorf("fail to create and upload ssm doc, err: %v", err)
return stacktrace.Propagate(err, "could not create and upload the ssm document")
}
experimentsDetails.IsDocsUploaded = true
log.Info("[Info]: SSM docs uploaded successfully")
@ -52,27 +59,27 @@ func PrepareAWSSSMChaosByID(experimentsDetails *experimentTypes.ExperimentDetail
//get the instance id or list of instance ids
instanceIDList := strings.Split(experimentsDetails.EC2InstanceID, ",")
if len(instanceIDList) == 0 {
return errors.Errorf("no instance id found for chaos injection")
if experimentsDetails.EC2InstanceID == "" || len(instanceIDList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no instance id found for chaos injection"}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = lib.InjectChaosInSerialMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return err
if err = lib.InjectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = lib.InjectChaosInParallelMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return err
if err = lib.InjectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Delete the ssm document on the given aws service monitoring docs
err = ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region)
if err != nil {
return errors.Errorf("fail to delete ssm doc, err: %v", err)
return stacktrace.Propagate(err, "failed to delete ssm doc")
}
//Waiting for the ramp time after chaos injection

View File

@ -1,6 +1,8 @@
package ssm
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
@ -8,16 +10,21 @@ import (
"github.com/litmuschaos/litmus-go/chaoslib/litmus/aws-ssm-chaos/lib"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
//PrepareAWSSSMChaosByTag contains the prepration and injection steps for the experiment
func PrepareAWSSSMChaosByTag(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareAWSSSMChaosByTag contains the prepration and injection steps for the experiment
func PrepareAWSSSMChaosByTag(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSSSMFaultByTag")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -37,7 +44,7 @@ func PrepareAWSSSMChaosByTag(experimentsDetails *experimentTypes.ExperimentDetai
//create and upload the ssm document on the given aws service monitoring docs
if err = ssm.CreateAndUploadDocument(experimentsDetails.DocumentName, experimentsDetails.DocumentType, experimentsDetails.DocumentFormat, experimentsDetails.DocumentPath, experimentsDetails.Region); err != nil {
return errors.Errorf("fail to create and upload ssm doc, err: %v", err)
return stacktrace.Propagate(err, "could not create and upload the ssm document")
}
experimentsDetails.IsDocsUploaded = true
log.Info("[Info]: SSM docs uploaded successfully")
@ -48,26 +55,26 @@ func PrepareAWSSSMChaosByTag(experimentsDetails *experimentTypes.ExperimentDetai
log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceIDList))
if len(instanceIDList) == 0 {
return errors.Errorf("no instance id found for chaos injection")
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no instance id found for chaos injection"}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = lib.InjectChaosInSerialMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return err
if err = lib.InjectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = lib.InjectChaosInParallelMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return err
if err = lib.InjectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails, inject); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Delete the ssm document on the given aws service monitoring docs
err = ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region)
if err != nil {
return errors.Errorf("fail to delete ssm doc, err: %v", err)
return stacktrace.Propagate(err, "failed to delete ssm doc")
}
//Waiting for the ramp time after chaos injection

View File

@ -1,6 +1,8 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
@ -9,16 +11,19 @@ import (
"github.com/Azure/azure-sdk-for-go/profiles/latest/compute/mgmt/compute"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/azure/disk-loss/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
diskStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/disk"
instanceStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/instance"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
@ -26,8 +31,10 @@ var (
inject, abort chan os.Signal
)
//PrepareChaos contains the prepration and injection steps for the experiment
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareChaos contains the prepration and injection steps for the experiment
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAzureDiskLossFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -47,13 +54,13 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
//get the disk name or list of disk names
diskNameList := strings.Split(experimentsDetails.VirtualDiskNames, ",")
if len(diskNameList) == 0 {
return errors.Errorf("no volume names found to detach")
if experimentsDetails.VirtualDiskNames == "" || len(diskNameList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no volume names found to detach"}
}
instanceNamesWithDiskNames, err := diskStatus.GetInstanceNameForDisks(diskNameList, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup)
if err != nil {
return errors.Errorf("error fetching attached instances for disks, err: %v", err)
return stacktrace.Propagate(err, "error fetching attached instances for disks")
}
// Get the instance name with attached disks
@ -62,7 +69,7 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
for instanceName := range instanceNamesWithDiskNames {
attachedDisksWithInstance[instanceName], err = diskStatus.GetInstanceDiskList(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, experimentsDetails.ScaleSet, instanceName)
if err != nil {
return errors.Errorf("error fetching virtual disks, err: %v", err)
return stacktrace.Propagate(err, "error fetching virtual disks")
}
}
@ -77,15 +84,15 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, instanceNamesWithDiskNames, attachedDisksWithInstance, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceNamesWithDiskNames, attachedDisksWithInstance, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, instanceNamesWithDiskNames, attachedDisksWithInstance, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceNamesWithDiskNames, attachedDisksWithInstance, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -97,8 +104,10 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
return nil
}
// injectChaosInParallelMode will inject the azure disk loss chaos in parallel mode that is all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesWithDiskNames map[string][]string, attachedDisksWithInstance map[string]*[]compute.DataDisk, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInParallelMode will inject the Azure disk loss chaos in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesWithDiskNames map[string][]string, attachedDisksWithInstance map[string]*[]compute.DataDisk, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureDiskLossFaultInParallelMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
@ -107,7 +116,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on azure virtual disk"
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure virtual disk"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
@ -116,7 +125,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
log.Info("[Chaos]: Detaching the virtual disks from the instances")
for instanceName, diskNameList := range instanceNamesWithDiskNames {
if err = diskStatus.DetachDisks(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameList); err != nil {
return errors.Errorf("failed to detach disks, err: %v", err)
return stacktrace.Propagate(err, "failed to detach disks")
}
}
// Waiting for disk to be detached
@ -124,7 +133,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for _, diskName := range diskNameList {
log.Infof("[Wait]: Waiting for Disk '%v' to detach", diskName)
if err := diskStatus.WaitForDiskToDetach(experimentsDetails, diskName); err != nil {
return errors.Errorf("disk attach check failed, err: %v", err)
return stacktrace.Propagate(err, "disk detachment check failed")
}
}
}
@ -137,8 +146,8 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -150,24 +159,24 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
log.Info("[Chaos]: Attaching the Virtual disks back to the instances")
for instanceName, diskNameList := range attachedDisksWithInstance {
if err = diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameList); err != nil {
return errors.Errorf("virtual disk attachment failed, err: %v", err)
return stacktrace.Propagate(err, "virtual disk attachment failed")
}
}
// Wait for disk to be attached
for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList {
log.Infof("[Wait]: Waiting for Disk '%v' to attach", diskName)
if err := diskStatus.WaitForDiskToAttach(experimentsDetails, diskName); err != nil {
return errors.Errorf("disk attach check failed, err: %v", err)
// Wait for disk to be attached
for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList {
log.Infof("[Wait]: Waiting for Disk '%v' to attach", diskName)
if err := diskStatus.WaitForDiskToAttach(experimentsDetails, diskName); err != nil {
return stacktrace.Propagate(err, "disk attachment check failed")
}
}
}
}
// Updating the result details
for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList {
common.SetTargets(diskName, "re-attached", "VirtualDisk", chaosDetails)
// Updating the result details
for _, diskNameList := range instanceNamesWithDiskNames {
for _, diskName := range diskNameList {
common.SetTargets(diskName, "re-attached", "VirtualDisk", chaosDetails)
}
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
@ -175,8 +184,10 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
return nil
}
//injectChaosInSerialMode will inject the azure disk loss chaos in serial mode that is one after other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesWithDiskNames map[string][]string, attachedDisksWithInstance map[string]*[]compute.DataDisk, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInSerialMode will inject the Azure disk loss chaos in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesWithDiskNames map[string][]string, attachedDisksWithInstance map[string]*[]compute.DataDisk, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureDiskLossFaultInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
@ -185,7 +196,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
for duration < experimentsDetails.ChaosDuration {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on azure virtual disks"
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure virtual disks"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
@ -198,13 +209,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
// Detaching the virtual disks
log.Infof("[Chaos]: Detaching %v from the instance", diskName)
if err = diskStatus.DetachDisks(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskNameToList); err != nil {
return errors.Errorf("failed to detach disks, err: %v", err)
return stacktrace.Propagate(err, "failed to detach disks")
}
// Waiting for disk to be detached
log.Infof("[Wait]: Waiting for Disk '%v' to detach", diskName)
if err := diskStatus.WaitForDiskToDetach(experimentsDetails, diskName); err != nil {
return errors.Errorf("disk detach check failed, err: %v", err)
return stacktrace.Propagate(err, "disk detachment check failed")
}
common.SetTargets(diskName, "detached", "VirtualDisk", chaosDetails)
@ -212,8 +223,8 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -224,13 +235,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Attaching the virtual disks to the instance
log.Infof("[Chaos]: Attaching %v back to the instance", diskName)
if err = diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, attachedDisksWithInstance[instanceName]); err != nil {
return errors.Errorf("disk attachment failed, err: %v", err)
return stacktrace.Propagate(err, "disk attachment failed")
}
// Waiting for disk to be attached
log.Infof("[Wait]: Waiting for Disk '%v' to attach", diskName)
if err := diskStatus.WaitForDiskToAttach(experimentsDetails, diskName); err != nil {
return errors.Errorf("disk attach check failed, err: %v", err)
return stacktrace.Propagate(err, "disk attachment check failed")
}
common.SetTargets(diskName, "re-attached", "VirtualDisk", chaosDetails)
@ -257,10 +268,10 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, attache
Try(func(attempt uint) error {
status, err := instanceStatus.GetAzureInstanceProvisionStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet)
if err != nil {
return errors.Errorf("Failed to get instance, err: %v", err)
return stacktrace.Propagate(err, "failed to get instance")
}
if status != "Provisioning succeeded" {
return errors.Errorf("instance is updating, waiting for instance to finish update")
return stacktrace.Propagate(err, "instance is updating, waiting for instance to finish update")
}
return nil
})
@ -271,11 +282,11 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, attache
for _, disk := range *diskList {
diskStatusString, err := diskStatus.GetDiskStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, *disk.Name)
if err != nil {
log.Errorf("Failed to get disk status, err: %v", err)
log.Errorf("Failed to get disk status: %v", err)
}
if diskStatusString != "Attached" {
if err := diskStatus.AttachDisk(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, instanceName, experimentsDetails.ScaleSet, diskList); err != nil {
log.Errorf("failed to attach disk '%v, manual revert required, err: %v", err)
log.Errorf("Failed to attach disk, manual revert required: %v", err)
} else {
common.SetTargets(*disk.Name, "re-attached", "VirtualDisk", chaosDetails)
}

View File

@ -1,6 +1,8 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
@ -8,15 +10,18 @@ import (
"time"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/azure/instance-stop/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
azureCommon "github.com/litmuschaos/litmus-go/pkg/cloud/azure/common"
azureStatus "github.com/litmuschaos/litmus-go/pkg/cloud/azure/instance"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
@ -25,7 +30,9 @@ var (
)
// PrepareAzureStop will initialize instanceNameList and start chaos injection based on sequence method selected
func PrepareAzureStop(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareAzureStop(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAzureInstanceStopFault")
defer span.End()
// inject channel is used to transmit signal notifications
inject = make(chan os.Signal, 1)
@ -44,8 +51,8 @@ func PrepareAzureStop(experimentsDetails *experimentTypes.ExperimentDetails, cli
// get the instance name or list of instance names
instanceNameList := strings.Split(experimentsDetails.AzureInstanceNames, ",")
if len(instanceNameList) == 0 {
return errors.Errorf("no instance name found to stop")
if experimentsDetails.AzureInstanceNames == "" || len(instanceNameList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no instance name found to stop"}
}
// watching for the abort signal and revert the chaos
@ -53,15 +60,15 @@ func PrepareAzureStop(experimentsDetails *experimentTypes.ExperimentDetails, cli
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, instanceNameList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceNameList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, instanceNameList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceNameList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// Waiting for the ramp time after chaos injection
@ -72,8 +79,11 @@ func PrepareAzureStop(experimentsDetails *experimentTypes.ExperimentDetails, cli
return nil
}
// injectChaosInSerialMode will inject the azure instance termination in serial mode that is one after the other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInSerialMode will inject the Azure instance termination in serial mode that is one after the other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureInstanceStopFaultInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
@ -88,7 +98,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Infof("[Info]: Target instanceName list, %v", instanceNameList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on azure instance"
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on Azure instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
@ -100,25 +110,25 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Infof("[Chaos]: Stopping the Azure instance: %v", vmName)
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to stop the Azure instance, err: %v", err)
return stacktrace.Propagate(err, "unable to stop the Azure instance")
}
} else {
if err := azureStatus.AzureInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to stop the Azure instance, err: %v", err)
return stacktrace.Propagate(err, "unable to stop the Azure instance")
}
}
// Wait for Azure instance to completely stop
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the stopped state", vmName)
if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("instance poweroff status check failed, err: %v", err)
return stacktrace.Propagate(err, "instance poweroff status check failed")
}
// Run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -130,18 +140,18 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Info("[Chaos]: Starting back the Azure instance")
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to start the Azure instance, err: %v", err)
return stacktrace.Propagate(err, "unable to start the Azure instance")
}
} else {
if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to start the Azure instance, err: %v", err)
return stacktrace.Propagate(err, "unable to start the Azure instance")
}
}
// Wait for Azure instance to get in running state
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the running state", vmName)
if err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("instance power on status check failed, err: %v", err)
return stacktrace.Propagate(err, "instance power on status check failed")
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
@ -150,8 +160,11 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
return nil
}
// injectChaosInParallelMode will inject the azure instance termination in parallel mode that is all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInParallelMode will inject the Azure instance termination in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceNameList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAzureInstanceStopFaultInParallelMode")
defer span.End()
select {
case <-inject:
// Stopping the chaos execution, if abort signal received
@ -177,11 +190,11 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
log.Infof("[Chaos]: Stopping the Azure instance: %v", vmName)
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to stop azure instance, err: %v", err)
return stacktrace.Propagate(err, "unable to stop Azure instance")
}
} else {
if err := azureStatus.AzureInstanceStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to stop azure instance, err: %v", err)
return stacktrace.Propagate(err, "unable to stop Azure instance")
}
}
}
@ -190,14 +203,14 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for _, vmName := range instanceNameList {
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the stopped state", vmName)
if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("instance poweroff status check failed, err: %v", err)
return stacktrace.Propagate(err, "instance poweroff status check failed")
}
}
// Run probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -210,11 +223,11 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
log.Infof("[Chaos]: Starting back the Azure instance: %v", vmName)
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to start the Azure instance, err: %v", err)
return stacktrace.Propagate(err, "unable to start the Azure instance")
}
} else {
if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("unable to start the Azure instance, err: %v", err)
return stacktrace.Propagate(err, "unable to start the Azure instance")
}
}
}
@ -223,7 +236,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for _, vmName := range instanceNameList {
log.Infof("[Wait]: Waiting for Azure instance '%v' to get in the running state", vmName)
if err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
return errors.Errorf("instance power on status check failed, err: %v", err)
return stacktrace.Propagate(err, "instance power on status check failed")
}
}
@ -248,22 +261,22 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanc
instanceState, err = azureStatus.GetAzureInstanceStatus(experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName)
}
if err != nil {
log.Errorf("[Abort]: Fail to get instance status when an abort signal is received, err: %v", err)
log.Errorf("[Abort]: Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "VM running" && instanceState != "VM starting" {
log.Info("[Abort]: Waiting for the Azure instance to get down")
if err := azureStatus.WaitForAzureComputeDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
log.Errorf("[Abort]: Instance power off status check failed, err: %v", err)
log.Errorf("[Abort]: Instance power off status check failed: %v", err)
}
log.Info("[Abort]: Starting Azure instance as abort signal received")
if experimentsDetails.ScaleSet == "enable" {
if err := azureStatus.AzureScaleSetInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
log.Errorf("[Abort]: Unable to start the Azure instance, err: %v", err)
log.Errorf("[Abort]: Unable to start the Azure instance: %v", err)
}
} else {
if err := azureStatus.AzureInstanceStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName); err != nil {
log.Errorf("[Abort]: Unable to start the Azure instance, err: %v", err)
log.Errorf("[Abort]: Unable to start the Azure instance: %v", err)
}
}
}
@ -271,7 +284,7 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanc
log.Info("[Abort]: Waiting for the Azure instance to start")
err := azureStatus.WaitForAzureComputeUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup, vmName)
if err != nil {
log.Errorf("[Abort]: Instance power on status check failed, err: %v", err)
log.Errorf("[Abort]: Instance power on status check failed: %v", err)
log.Errorf("[Abort]: Azure instance %v failed to start after an abort signal is received", vmName)
}
}

View File

@ -1,28 +1,38 @@
package helper
import (
"bytes"
"context"
"fmt"
"os/exec"
"strconv"
"strings"
"time"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/palantir/stacktrace"
"github.com/sirupsen/logrus"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientTypes "k8s.io/apimachinery/pkg/types"
)
var err error
// Helper injects the container-kill chaos
func Helper(clients clients.ClientSets) {
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulateContainerKillFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
@ -33,14 +43,18 @@ func Helper(clients clients.ClientSets) {
log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails)
// Intialise the chaos attributes
// Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters
// Initialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
err := killContainer(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails)
if err != nil {
if err := killContainer(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
@ -49,6 +63,33 @@ func Helper(clients clients.ClientSets) {
// it will kill the container till the chaos duration
// the execution will stop after timestamp passes the given chaos duration
func killContainer(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not parse targets")
}
var targets []targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
targets = append(targets, td)
log.Infof("Injecting chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
}
if err := killIterations(targets, experimentsDetails, clients, eventsDetails, chaosDetails, resultDetails); err != nil {
return err
}
log.Infof("[Completion]: %v chaos has been completed", experimentsDetails.ExperimentName)
return nil
}
func killIterations(targets []targetDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
@ -56,43 +97,30 @@ func killContainer(experimentsDetails *experimentTypes.ExperimentDetails, client
for duration < experimentsDetails.ChaosDuration {
//getRestartCount return the restart count of target container
restartCountBefore, err := getRestartCount(experimentsDetails, experimentsDetails.TargetPods, clients)
if err != nil {
return err
}
var containerIds []string
//Obtain the container ID through Pod
// this id will be used to select the container for the kill
containerID, err := common.GetContainerID(experimentsDetails.AppNS, experimentsDetails.TargetPods, experimentsDetails.TargetContainer, clients)
if err != nil {
return errors.Errorf("Unable to get the container id, %v", err)
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": experimentsDetails.TargetPods,
"ContainerName": experimentsDetails.TargetContainer,
"RestartCountBefore": restartCountBefore,
})
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
switch experimentsDetails.ContainerRuntime {
case "docker":
if err := stopDockerContainer(containerID, experimentsDetails.SocketPath, experimentsDetails.Signal); err != nil {
return err
for _, t := range targets {
t.RestartCountBefore, err = getRestartCount(t, clients)
if err != nil {
return stacktrace.Propagate(err, "could get container restart count")
}
case "containerd", "crio":
if err := stopContainerdContainer(containerID, experimentsDetails.SocketPath, experimentsDetails.Signal); err != nil {
return err
containerId, err := common.GetContainerID(t.Namespace, t.Name, t.TargetContainer, clients, t.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
default:
return errors.Errorf("%v container runtime not supported", experimentsDetails.ContainerRuntime)
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": t.Name,
"ContainerName": t.TargetContainer,
"RestartCountBefore": t.RestartCountBefore,
})
containerIds = append(containerIds, containerId)
}
if err := kill(experimentsDetails, containerIds, clients, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not kill target container")
}
//Waiting for the chaos interval after chaos injection
@ -101,67 +129,93 @@ func killContainer(experimentsDetails *experimentTypes.ExperimentDetails, client
common.WaitForDuration(experimentsDetails.ChaosInterval)
}
//Check the status of restarted container
err = common.CheckContainerStatus(experimentsDetails.AppNS, experimentsDetails.TargetPods, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("application container is not in running state, %v", err)
for _, t := range targets {
if err := validate(t, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not verify restart count")
}
if err := result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "targeted", "pod", t.Name); err != nil {
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
// It will verify that the restart count of container should increase after chaos injection
err = verifyRestartCount(experimentsDetails, experimentsDetails.TargetPods, clients, restartCountBefore)
if err != nil {
return err
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
if err := result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "targeted", "pod", experimentsDetails.TargetPods); err != nil {
return nil
}
func kill(experimentsDetails *experimentTypes.ExperimentDetails, containerIds []string, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
switch experimentsDetails.ContainerRuntime {
case "docker":
if err := stopDockerContainer(containerIds, experimentsDetails.SocketPath, experimentsDetails.Signal, experimentsDetails.ChaosPodName); err != nil {
if isContextDeadlineExceeded(err) {
return nil
}
return stacktrace.Propagate(err, "could not stop container")
}
case "containerd", "crio":
if err := stopContainerdContainer(containerIds, experimentsDetails.SocketPath, experimentsDetails.Signal, experimentsDetails.ChaosPodName, experimentsDetails.Timeout); err != nil {
if isContextDeadlineExceeded(err) {
return nil
}
return stacktrace.Propagate(err, "could not stop container")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: fmt.Sprintf("unsupported container runtime %s", experimentsDetails.ContainerRuntime)}
}
return nil
}
func validate(t targetDetails, timeout, delay int, clients clients.ClientSets) error {
//Check the status of restarted container
if err := common.CheckContainerStatus(t.Namespace, t.Name, timeout, delay, clients, t.Source); err != nil {
return err
}
log.Infof("[Completion]: %v chaos has been completed", experimentsDetails.ExperimentName)
return nil
// It will verify that the restart count of container should increase after chaos injection
return verifyRestartCount(t, timeout, delay, clients, t.RestartCountBefore)
}
//stopContainerdContainer kill the application container
func stopContainerdContainer(containerID, socketPath, signal string) error {
var errOut bytes.Buffer
var cmd *exec.Cmd
endpoint := "unix://" + socketPath
switch signal {
case "SIGKILL":
cmd = exec.Command("sudo", "crictl", "-i", endpoint, "-r", endpoint, "stop", "--timeout=0", string(containerID))
case "SIGTERM":
cmd = exec.Command("sudo", "crictl", "-i", endpoint, "-r", endpoint, "stop", string(containerID))
default:
return errors.Errorf("{%v} signal not supported, use either SIGTERM or SIGKILL", signal)
// stopContainerdContainer kill the application container
func stopContainerdContainer(containerIDs []string, socketPath, signal, source string, timeout int) error {
if signal != "SIGKILL" && signal != "SIGTERM" {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: source, Reason: fmt.Sprintf("unsupported signal %s, use either SIGTERM or SIGKILL", signal)}
}
cmd.Stderr = &errOut
if err := cmd.Run(); err != nil {
return errors.Errorf("Unable to run command, err: %v; error output: %v", err, errOut.String())
cmd := exec.Command("sudo", "crictl", "-i", fmt.Sprintf("unix://%s", socketPath), "-r", fmt.Sprintf("unix://%s", socketPath), "stop")
if signal == "SIGKILL" {
cmd.Args = append(cmd.Args, "--timeout=0")
} else if timeout != -1 {
cmd.Args = append(cmd.Args, fmt.Sprintf("--timeout=%v", timeout))
}
return nil
cmd.Args = append(cmd.Args, containerIDs...)
return common.RunCLICommands(cmd, source, "", "failed to stop container", cerrors.ErrorTypeChaosInject)
}
//stopDockerContainer kill the application container
func stopDockerContainer(containerID, socketPath, signal string) error {
var errOut bytes.Buffer
host := "unix://" + socketPath
cmd := exec.Command("sudo", "docker", "--host", host, "kill", string(containerID), "--signal", signal)
cmd.Stderr = &errOut
if err := cmd.Run(); err != nil {
return errors.Errorf("Unable to run command, err: %v; error output: %v", err, errOut.String())
}
return nil
// stopDockerContainer kill the application container
func stopDockerContainer(containerIDs []string, socketPath, signal, source string) error {
cmd := exec.Command("sudo", "docker", "--host", fmt.Sprintf("unix://%s", socketPath), "kill", "--signal", signal)
cmd.Args = append(cmd.Args, containerIDs...)
return common.RunCLICommands(cmd, source, "", "failed to stop container", cerrors.ErrorTypeChaosInject)
}
//getRestartCount return the restart count of target container
func getRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets) (int, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(context.Background(), podName, v1.GetOptions{})
// getRestartCount return the restart count of target container
func getRestartCount(target targetDetails, clients clients.ClientSets) (int, error) {
pod, err := clients.GetPod(target.Namespace, target.Name, 180, 2)
if err != nil {
return 0, err
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: target.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", target.Name, target.Namespace), Reason: err.Error()}
}
restartCount := 0
for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentsDetails.TargetContainer {
if container.Name == target.TargetContainer {
restartCount = int(container.RestartCount)
break
}
@ -169,39 +223,36 @@ func getRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, podN
return restartCount, nil
}
//verifyRestartCount verify the restart count of target container that it is restarted or not after chaos injection
func verifyRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets, restartCountBefore int) error {
// verifyRestartCount verify the restart count of target container that it is restarted or not after chaos injection
func verifyRestartCount(t targetDetails, timeout, delay int, clients clients.ClientSets, restartCountBefore int) error {
restartCountAfter := 0
return retry.
Times(uint(experimentsDetails.Timeout / experimentsDetails.Delay)).
Wait(time.Duration(experimentsDetails.Delay) * time.Second).
Times(uint(timeout / delay)).
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(context.Background(), podName, v1.GetOptions{})
pod, err := clients.KubeClient.CoreV1().Pods(t.Namespace).Get(context.Background(), t.Name, v1.GetOptions{})
if err != nil {
return errors.Errorf("Unable to find the pod with name %v, err: %v", podName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
}
for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentsDetails.TargetContainer {
if container.Name == t.TargetContainer {
restartCountAfter = int(container.RestartCount)
break
}
}
if restartCountAfter <= restartCountBefore {
return errors.Errorf("Target container is not restarted")
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: "target container is not restarted after kill"}
}
log.Infof("restartCount of target container after chaos injection: %v", strconv.Itoa(restartCountAfter))
return nil
})
}
//getENV fetches all the env variables from the runner pod
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
@ -213,4 +264,17 @@ func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.Signal = types.Getenv("SIGNAL", "SIGKILL")
experimentDetails.Delay, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_DELAY", "2"))
experimentDetails.Timeout, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_TIMEOUT", "180"))
experimentDetails.ContainerAPITimeout, _ = strconv.Atoi(types.Getenv("CONTAINER_API_TIMEOUT", "-1"))
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
RestartCountBefore int
Source string
}
func isContextDeadlineExceeded(err error) bool {
return strings.Contains(err.Error(), "context deadline exceeded")
}

View File

@ -2,34 +2,40 @@ package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareContainerKill contains the prepration steps before chaos injection
func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareContainerKill contains the preparation steps before chaos injection
func PrepareContainerKill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareContainerKillFault")
defer span.End()
targetPodList := apiv1.PodList{}
var err error
var podsAffectedPerc int
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
//Setup the tunables if provided in range
//Set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The tunables are:", logrus.Fields{
@ -37,33 +43,11 @@ func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails,
"Sequence": experimentsDetails.Sequence,
})
podsAffectedPerc, _ = strconv.Atoi(experimentsDetails.PodsAffectedPerc)
if experimentsDetails.NodeLabel == "" {
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
} else {
if experimentsDetails.TargetPods == "" {
targetPodList, err = common.GetPodListFromSpecifiedNodes(experimentsDetails.TargetPods, podsAffectedPerc, experimentsDetails.NodeLabel, clients, chaosDetails)
if err != nil {
return err
}
} else {
log.Infof("TARGET_PODS env is provided, overriding the NODE_LABEL input")
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
}
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
@ -74,28 +58,28 @@ func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails,
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err)
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "")
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -107,13 +91,12 @@ func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails,
}
// injectChaosInSerialMode kill the container of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
var err error
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectContainerKillFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -123,112 +106,62 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pods")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
var err error
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectContainerKillFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pods")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, podName, nodeName, runID, labelSuffix string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateContainerKillFaultHelperPod")
defer span.End()
privilegedEnable := false
if experimentsDetails.ContainerRuntime == "crio" {
@ -238,10 +171,10 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
@ -272,7 +205,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name container-kill",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, podName),
Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
@ -287,17 +220,23 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName string) []apiv1.EnvVar {
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS).
SetEnv("APP_POD", podName).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
@ -309,14 +248,17 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName st
SetEnv("STATUS_CHECK_DELAY", strconv.Itoa(experimentsDetails.Delay)).
SetEnv("STATUS_CHECK_TIMEOUT", strconv.Itoa(experimentsDetails.Timeout)).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("CONTAINER_API_TIMEOUT", strconv.Itoa(experimentsDetails.ContainerAPITimeout)).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
}
//SetChaosTunables will setup a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value.
// SetChaosTunables will setup a random value within a given range of values
// If the value is not provided in range it'll setup the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)

View File

@ -11,6 +11,11 @@ import (
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/disk-fill/types"
@ -18,7 +23,6 @@ import (
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
"k8s.io/apimachinery/pkg/api/resource"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
@ -28,7 +32,9 @@ import (
var inject, abort chan os.Signal
// Helper injects the disk-fill chaos
func Helper(clients clients.ClientSets) {
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulateDiskFillFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
@ -51,6 +57,7 @@ func Helper(clients clients.ClientSets) {
// Intialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
@ -59,57 +66,58 @@ func Helper(clients clients.ClientSets) {
result.SetResultUID(&resultDetails, clients, &chaosDetails)
if err := diskFill(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
//diskFill contains steps to inject disk-fill chaos
// diskFill contains steps to inject disk-fill chaos
func diskFill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
// Derive the container id of the target container
containerID, err := common.GetContainerID(experimentsDetails.AppNS, experimentsDetails.TargetPods, experimentsDetails.TargetContainer, clients)
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return err
return stacktrace.Propagate(err, "could not parse targets")
}
// derive the used ephemeral storage size from the target container
du := fmt.Sprintf("sudo du /diskfill/%v", containerID)
cmd := exec.Command("/bin/bash", "-c", du)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(string(out))
return err
var targets []targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
// Derive the container id of the target container
td.ContainerId, err = common.GetContainerID(td.Namespace, td.Name, td.TargetContainer, clients, chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the pid of the target container
td.TargetPID, err = common.GetPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return err
}
td.SizeToFill, err = getDiskSizeToFill(td, experimentsDetails, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get disk size to fill")
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": td.Name,
"Namespace": td.Namespace,
"SizeToFill(KB)": td.SizeToFill,
"TargetContainer": td.TargetContainer,
})
targets = append(targets, td)
}
ephemeralStorageDetails := string(out)
// filtering out the used ephemeral storage from the output of du command
usedEphemeralStorageSize, err := filterUsedEphemeralStorage(ephemeralStorageDetails)
if err != nil {
return errors.Errorf("unable to filter used ephemeral storage size, err: %v", err)
}
log.Infof("used ephemeral storage space: %vKB", strconv.Itoa(usedEphemeralStorageSize))
// GetEphemeralStorageAttributes derive the ephemeral storage attributes from the target container
ephemeralStorageLimit, err := getEphemeralStorageAttributes(experimentsDetails, clients)
if err != nil {
return err
}
if ephemeralStorageLimit == 0 && experimentsDetails.EphemeralStorageMebibytes == "0" {
return errors.Errorf("either provide ephemeral storage limit inside target container or define EPHEMERAL_STORAGE_MEBIBYTES ENV")
}
// deriving the ephemeral storage size to be filled
sizeTobeFilled := getSizeToBeFilled(experimentsDetails, usedEphemeralStorageSize, int(ephemeralStorageLimit))
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": experimentsDetails.TargetPods,
"ContainerName": experimentsDetails.TargetContainer,
"ephemeralStorageLimit(KB)": ephemeralStorageLimit,
"ContainerID": containerID,
})
log.Infof("ephemeral storage size to be filled: %vKB", strconv.Itoa(sizeTobeFilled))
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
@ -119,65 +127,80 @@ func diskFill(experimentsDetails *experimentTypes.ExperimentDetails, clients cli
}
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, clients, containerID, resultDetails.Name)
if sizeTobeFilled > 0 {
if err := fillDisk(containerID, sizeTobeFilled, experimentsDetails.DataBlockSize); err != nil {
log.Error(string(out))
return err
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Stopping the experiment")
// It will delete the target pod if target pod is evicted
// if target pod is still running then it will delete all the files, which was created earlier during chaos execution
err = remedy(experimentsDetails, clients, containerID)
if err != nil {
return errors.Errorf("unable to perform remedy operation, err: %v", err)
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
} else {
log.Warn("No required free space found!, It's Housefull")
}
return nil
}
// fillDisk fill the ephemeral disk by creating files
func fillDisk(containerID string, sizeTobeFilled, bs int) error {
go abortWatcher(targets, experimentsDetails, clients, resultDetails.Name)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
// Creating files to fill the required ephemeral storage size of block size of 4K
log.Infof("[Fill]: Filling ephemeral storage, size: %vKB", sizeTobeFilled)
dd := fmt.Sprintf("sudo dd if=/dev/urandom of=/diskfill/%v/diskfill bs=%vK count=%v", containerID, bs, strconv.Itoa(sizeTobeFilled/bs))
log.Infof("dd: {%v}", dd)
cmd := exec.Command("/bin/bash", "-c", dd)
_, err := cmd.CombinedOutput()
return err
}
for _, t := range targets {
if t.SizeToFill > 0 {
if err := fillDisk(t, experimentsDetails.DataBlockSize); err != nil {
return stacktrace.Propagate(err, "could not fill ephemeral storage")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertDiskFill(t, clients); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
} else {
log.Warn("No required free space found!")
}
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Stopping the experiment")
var errList []string
for _, t := range targets {
// It will delete the target pod if target pod is evicted
// if target pod is still running then it will delete all the files, which was created earlier during chaos execution
if err = revertDiskFill(t, clients); err != nil {
errList = append(errList, err.Error())
continue
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// fillDisk fill the ephemeral disk by creating files
func fillDisk(t targetDetails, bs int) error {
// Creating files to fill the required ephemeral storage size of block size of 4K
log.Infof("[Fill]: Filling ephemeral storage, size: %vKB", t.SizeToFill)
dd := fmt.Sprintf("sudo dd if=/dev/urandom of=/proc/%v/root/home/diskfill bs=%vK count=%v", t.TargetPID, bs, strconv.Itoa(t.SizeToFill/bs))
log.Infof("dd: {%v}", dd)
cmd := exec.Command("/bin/bash", "-c", dd)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(err.Error())
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: string(out)}
}
return nil
}
// getEphemeralStorageAttributes derive the ephemeral storage attributes from the target pod
func getEphemeralStorageAttributes(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (int64, error) {
func getEphemeralStorageAttributes(t targetDetails, clients clients.ClientSets) (int64, error) {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(context.Background(), experimentsDetails.TargetPods, v1.GetOptions{})
pod, err := clients.GetPod(t.Namespace, t.Name, 180, 2)
if err != nil {
return 0, err
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
}
var ephemeralStorageLimit int64
@ -186,7 +209,7 @@ func getEphemeralStorageAttributes(experimentsDetails *experimentTypes.Experimen
// Extracting ephemeral storage limit & requested value from the target container
// It will be in the form of Kb
for _, container := range containers {
if container.Name == experimentsDetails.TargetContainer {
if container.Name == t.TargetContainer {
ephemeralStorageLimit = container.Resources.Limits.StorageEphemeral().ToDec().ScaledValue(resource.Kilo)
break
}
@ -203,7 +226,7 @@ func filterUsedEphemeralStorage(ephemeralStorageDetails string) (int, error) {
ephemeralStorageAll := strings.Split(ephemeralStorageDetails, "\n")
// It will return the details of main directory
ephemeralStorageAllDiskFill := strings.Split(ephemeralStorageAll[len(ephemeralStorageAll)-2], "\t")[0]
// type casting string to interger
// type casting string to integer
ephemeralStorageSize, err := strconv.Atoi(ephemeralStorageAllDiskFill)
return ephemeralStorageSize, err
}
@ -226,40 +249,38 @@ func getSizeToBeFilled(experimentsDetails *experimentTypes.ExperimentDetails, us
return needToBeFilled
}
// remedy will delete the target pod if target pod is evicted
// revertDiskFill will delete the target pod if target pod is evicted
// if target pod is still running then it will delete the files, which was created during chaos execution
func remedy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, containerID string) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(context.Background(), experimentsDetails.TargetPods, v1.GetOptions{})
func revertDiskFill(t targetDetails, clients clients.ClientSets) error {
pod, err := clients.GetPod(t.Namespace, t.Name, 180, 2)
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s,namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
}
// Deleting the pod as pod is already evicted
podReason := pod.Status.Reason
if podReason == "Evicted" {
// Deleting the pod as pod is already evicted
log.Warn("Target pod is evicted, deleting the pod")
if err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(context.Background(), experimentsDetails.TargetPods, v1.DeleteOptions{}); err != nil {
return err
if err := clients.KubeClient.CoreV1().Pods(t.Namespace).Delete(context.Background(), t.Name, v1.DeleteOptions{}); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s,namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("failed to delete target pod after eviction :%s", err.Error())}
}
} else {
// deleting the files after chaos execution
rm := fmt.Sprintf("sudo rm -rf /diskfill/%v/diskfill", containerID)
rm := fmt.Sprintf("sudo rm -rf /proc/%v/root/home/diskfill", t.TargetPID)
cmd := exec.Command("/bin/bash", "-c", rm)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(string(out))
return err
log.Error(err.Error())
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s,namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("failed to cleanup ephemeral storage: %s", string(out))}
}
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
return nil
}
//getENV fetches all the env variables from the runner pod
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
@ -268,10 +289,12 @@ func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.FillPercentage = types.Getenv("FILL_PERCENTAGE", "")
experimentDetails.EphemeralStorageMebibytes = types.Getenv("EPHEMERAL_STORAGE_MEBIBYTES", "")
experimentDetails.DataBlockSize, _ = strconv.Atoi(types.Getenv("DATA_BLOCK_SIZE", "256"))
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, containerID, resultName string) {
func abortWatcher(targets []targetDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultName string) {
// waiting till the abort signal received
<-abort
@ -280,15 +303,72 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
if err := remedy(experimentsDetails, clients, containerID); err != nil {
log.Errorf("unable to perform remedy operation, err: %v", err)
for _, t := range targets {
err := revertDiskFill(t, clients)
if err != nil {
log.Errorf("unable to kill disk-fill process, err :%v", err)
continue
}
if err = result.AnnotateChaosResult(resultName, experimentsDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
}
retry--
time.Sleep(1 * time.Second)
}
if err := result.AnnotateChaosResult(resultName, experimentsDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
log.Info("Chaos Revert Completed")
os.Exit(1)
}
func getDiskSizeToFill(t targetDetails, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (int, error) {
usedEphemeralStorageSize, err := getUsedEphemeralStorage(t)
if err != nil {
return 0, stacktrace.Propagate(err, "could not get used ephemeral storage")
}
// GetEphemeralStorageAttributes derive the ephemeral storage attributes from the target container
ephemeralStorageLimit, err := getEphemeralStorageAttributes(t, clients)
if err != nil {
return 0, stacktrace.Propagate(err, "could not get ephemeral storage attributes")
}
if ephemeralStorageLimit == 0 && experimentsDetails.EphemeralStorageMebibytes == "0" {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: "either provide ephemeral storage limit inside target container or define EPHEMERAL_STORAGE_MEBIBYTES ENV"}
}
// deriving the ephemeral storage size to be filled
sizeTobeFilled := getSizeToBeFilled(experimentsDetails, usedEphemeralStorageSize, int(ephemeralStorageLimit))
return sizeTobeFilled, nil
}
func getUsedEphemeralStorage(t targetDetails) (int, error) {
// derive the used ephemeral storage size from the target container
du := fmt.Sprintf("sudo du /proc/%v/root", t.TargetPID)
cmd := exec.Command("/bin/bash", "-c", du)
out, err := cmd.CombinedOutput()
if err != nil {
log.Error(err.Error())
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("failed to get used ephemeral storage size: %s", string(out))}
}
ephemeralStorageDetails := string(out)
// filtering out the used ephemeral storage from the output of du command
usedEphemeralStorageSize, err := filterUsedEphemeralStorage(ephemeralStorageDetails)
if err != nil {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("failed to get used ephemeral storage size: %s", err.Error())}
}
log.Infof("used ephemeral storage space: %vKB", strconv.Itoa(usedEphemeralStorageSize))
return usedEphemeralStorageSize, nil
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
ContainerId string
SizeToFill int
TargetPID int
Source string
}

View File

@ -2,37 +2,43 @@ package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/disk-fill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareDiskFill contains the prepration steps before chaos injection
func PrepareDiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareDiskFill contains the preparation steps before chaos injection
func PrepareDiskFill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareDiskFillFault")
defer span.End()
targetPodList := apiv1.PodList{}
var err error
var podsAffectedPerc int
// It will contains all the pod & container details required for exec command
// It will contain all the pod & container details required for exec command
execCommandDetails := exec.PodDetails{}
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
//setup the tunables if provided in range
//set up the tunables if provided in range
setChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
@ -42,33 +48,11 @@ func PrepareDiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clie
"Sequence": experimentsDetails.Sequence,
})
podsAffectedPerc, _ = strconv.Atoi(experimentsDetails.PodsAffectedPerc)
if experimentsDetails.NodeLabel == "" {
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
} else {
if experimentsDetails.TargetPods == "" {
targetPodList, err = common.GetPodListFromSpecifiedNodes(experimentsDetails.TargetPods, podsAffectedPerc, experimentsDetails.NodeLabel, clients, chaosDetails)
if err != nil {
return err
}
} else {
log.Infof("TARGET_PODS env is provided, overriding the NODE_LABEL input")
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
}
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
@ -79,28 +63,28 @@ func PrepareDiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clie
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err)
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "")
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, execCommandDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -112,13 +96,12 @@ func PrepareDiskFill(experimentsDetails *experimentTypes.ExperimentDetails, clie
}
// injectChaosInSerialMode fill the ephemeral storage of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
var err error
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectDiskFillFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -128,39 +111,18 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for disk-fill chaos
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
@ -169,86 +131,69 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode fill the ephemeral storage of of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectDiskFillFaultInParallelMode")
defer span.End()
labelSuffix := common.GetRunID()
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform disk-fill chaos
for _, pod := range targetPodList.Items {
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for disk-fill chaos
log.Info("[Cleanup]: Deleting all the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, appNodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateDiskFillFaultHelperPod")
defer span.End()
mountPropagationMode := apiv1.MountPropagationHostToContainer
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
Volumes: []apiv1.Volume{
{
Name: "udev",
Name: "socket-path",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.ContainerPath,
Path: experimentsDetails.SocketPath,
},
},
},
@ -266,29 +211,38 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name disk-fill",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, appName),
Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "udev",
MountPath: "/diskfill",
MountPropagation: &mountPropagationMode,
Name: "socket-path",
MountPath: experimentsDetails.SocketPath,
},
},
SecurityContext: &apiv1.SecurityContext{
Privileged: &privilegedEnable,
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName string) []apiv1.EnvVar {
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS).
SetEnv("APP_POD", podName).
envDetails.SetEnv("TARGETS", targets).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
@ -299,13 +253,17 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName st
SetEnv("EPHEMERAL_STORAGE_MEBIBYTES", experimentsDetails.EphemeralStorageMebibytes).
SetEnv("DATA_BLOCK_SIZE", strconv.Itoa(experimentsDetails.DataBlockSize)).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("CONTAINER_RUNTIME", experimentsDetails.ContainerRuntime).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
}
//setChaosTunables will setup a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value.
// setChaosTunables will setup a random value within a given range of values
// If the value is not provided in range it'll setup the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.FillPercentage = common.ValidateRange(experimentsDetails.FillPercentage)
experimentsDetails.EphemeralStorageMebibytes = common.ValidateRange(experimentsDetails.EphemeralStorageMebibytes)

View File

@ -2,31 +2,37 @@ package lib
import (
"context"
"fmt"
"strconv"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/docker-service-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareDockerServiceKill contains prepration steps before chaos injection
func PrepareDockerServiceKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareDockerServiceKill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareDockerServiceKillFault")
defer span.End()
var err error
if experimentsDetails.TargetNode == "" {
//Select node for docker-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node name")
}
}
@ -34,7 +40,7 @@ func PrepareDockerServiceKill(experimentsDetails *experimentTypes.ExperimentDeta
"NodeName": experimentsDetails.TargetNode,
})
experimentsDetails.RunID = common.GetRunID()
experimentsDetails.RunID = stringutils.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -50,52 +56,19 @@ func PrepareDockerServiceKill(experimentsDetails *experimentTypes.ExperimentDeta
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not set helper data")
}
}
// Creating the helper pod to perform docker-service-kill
if err = createHelperPod(experimentsDetails, clients, chaosDetails, experimentsDetails.TargetNode); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err = createHelperPod(ctx, experimentsDetails, clients, chaosDetails, experimentsDetails.TargetNode); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err = status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return err
}
}
// Checking for the node to be in not-ready state
log.Info("[Status]: Check for the node to be in NotReady state")
if err = status.CheckNodeNotReadyState(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("application node is not in NotReady state, err: %v", err)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
@ -107,7 +80,9 @@ func PrepareDockerServiceKill(experimentsDetails *experimentTypes.ExperimentDeta
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateDockerServiceKillFaultHelperPod")
defer span.End()
privileged := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
@ -116,7 +91,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, "", experimentsDetails.ExperimentName),
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
@ -188,8 +163,16 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
}
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
func ptrint64(p int64) *int64 {

View File

@ -1,18 +1,23 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
ebsloss "github.com/litmuschaos/litmus-go/chaoslib/litmus/ebs-loss/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
@ -20,8 +25,10 @@ var (
inject, abort chan os.Signal
)
//PrepareEBSLossByID contains the prepration and injection steps for the experiment
func PrepareEBSLossByID(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareEBSLossByID contains the prepration and injection steps for the experiment
func PrepareEBSLossByID(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEBSLossFaultByID")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -48,22 +55,22 @@ func PrepareEBSLossByID(experimentsDetails *experimentTypes.ExperimentDetails, c
//get the volume id or list of instance ids
volumeIDList := strings.Split(experimentsDetails.EBSVolumeID, ",")
if len(volumeIDList) == 0 {
return errors.Errorf("no volume id found to detach")
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no volume id found to detach"}
}
// watching for the abort signal and revert the chaos
go ebsloss.AbortWatcher(experimentsDetails, volumeIDList, abort, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = ebsloss.InjectChaosInSerialMode(experimentsDetails, volumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = ebsloss.InjectChaosInSerialMode(ctx, experimentsDetails, volumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = ebsloss.InjectChaosInParallelMode(experimentsDetails, volumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = ebsloss.InjectChaosInParallelMode(ctx, experimentsDetails, volumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection

View File

@ -1,18 +1,23 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
ebsloss "github.com/litmuschaos/litmus-go/chaoslib/litmus/ebs-loss/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
@ -20,8 +25,10 @@ var (
inject, abort chan os.Signal
)
//PrepareEBSLossByTag contains the prepration and injection steps for the experiment
func PrepareEBSLossByTag(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareEBSLossByTag contains the prepration and injection steps for the experiment
func PrepareEBSLossByTag(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEBSLossFaultByTag")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -53,15 +60,15 @@ func PrepareEBSLossByTag(experimentsDetails *experimentTypes.ExperimentDetails,
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = ebsloss.InjectChaosInSerialMode(experimentsDetails, targetEBSVolumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = ebsloss.InjectChaosInSerialMode(ctx, experimentsDetails, targetEBSVolumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = ebsloss.InjectChaosInParallelMode(experimentsDetails, targetEBSVolumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = ebsloss.InjectChaosInParallelMode(ctx, experimentsDetails, targetEBSVolumeIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {

View File

@ -1,22 +1,29 @@
package lib
import (
"context"
"fmt"
"os"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
ebs "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ebs"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ebs-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
//InjectChaosInSerialMode will inject the ebs loss chaos in serial mode which means one after other
func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetEBSVolumeIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// InjectChaosInSerialMode will inject the ebs loss chaos in serial mode which means one after other
func InjectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetEBSVolumeIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEBSLossFaultInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
@ -34,13 +41,13 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get volume attachment details
ec2InstanceID, device, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region)
if err != nil {
return errors.Errorf("fail to get the attachment info, err: %v", err)
return stacktrace.Propagate(err, "failed to get the attachment info")
}
//Detaching the ebs volume from the instance
log.Info("[Chaos]: Detaching the EBS volume from the instance")
if err = ebs.EBSVolumeDetach(volumeID, experimentsDetails.Region); err != nil {
return errors.Errorf("ebs detachment failed, err: %v", err)
return stacktrace.Propagate(err, "ebs detachment failed")
}
common.SetTargets(volumeID, "injected", "EBS", chaosDetails)
@ -48,14 +55,14 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Wait for ebs volume detachment
log.Infof("[Wait]: Wait for EBS volume detachment for volume %v", volumeID)
if err = ebs.WaitForVolumeDetachment(volumeID, ec2InstanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to detach the ebs volume to the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ebs detachment failed")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -66,7 +73,7 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Getting the EBS volume attachment status
ebsState, err := ebs.GetEBSStatus(volumeID, ec2InstanceID, experimentsDetails.Region)
if err != nil {
return errors.Errorf("failed to get the ebs status, err: %v", err)
return stacktrace.Propagate(err, "failed to get the ebs status")
}
switch ebsState {
@ -76,13 +83,13 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Attaching the ebs volume from the instance
log.Info("[Chaos]: Attaching the EBS volume back to the instance")
if err = ebs.EBSVolumeAttach(volumeID, ec2InstanceID, device, experimentsDetails.Region); err != nil {
return errors.Errorf("ebs attachment failed, err: %v", err)
return stacktrace.Propagate(err, "ebs attachment failed")
}
//Wait for ebs volume attachment
log.Infof("[Wait]: Wait for EBS volume attachment for %v volume", volumeID)
if err = ebs.WaitForVolumeAttachment(volumeID, ec2InstanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to attach the ebs volume to the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ebs attachment failed")
}
}
common.SetTargets(volumeID, "reverted", "EBS", chaosDetails)
@ -92,8 +99,10 @@ func InjectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
return nil
}
//InjectChaosInParallelMode will inject the chaos in parallel mode that means all at once
func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetEBSVolumeIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// InjectChaosInParallelMode will inject the chaos in parallel mode that means all at once
func InjectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetEBSVolumeIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEBSLossFaultInParallelMode")
defer span.End()
var ec2InstanceIDList, deviceList []string
@ -112,8 +121,15 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//prepare the instaceIDs and device name for all the given volume
for _, volumeID := range targetEBSVolumeIDList {
ec2InstanceID, device, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region)
if err != nil || ec2InstanceID == "" || device == "" {
return errors.Errorf("fail to get the attachment info, err: %v", err)
if err != nil {
return stacktrace.Propagate(err, "failed to get the attachment info")
}
if ec2InstanceID == "" || device == "" {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: "Volume not attached to any instance",
Target: fmt.Sprintf("EBS Volume ID: %v", volumeID),
}
}
ec2InstanceIDList = append(ec2InstanceIDList, ec2InstanceID)
deviceList = append(deviceList, device)
@ -123,28 +139,28 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Detaching the ebs volume from the instance
log.Info("[Chaos]: Detaching the EBS volume from the instance")
if err := ebs.EBSVolumeDetach(volumeID, experimentsDetails.Region); err != nil {
return errors.Errorf("ebs detachment failed, err: %v", err)
return stacktrace.Propagate(err, "ebs detachment failed")
}
common.SetTargets(volumeID, "injected", "EBS", chaosDetails)
}
log.Info("[Info]: Checking if the detachment process initiated")
if err := ebs.CheckEBSDetachmentInitialisation(targetEBSVolumeIDList, ec2InstanceIDList, experimentsDetails.Region); err != nil {
return errors.Errorf("fail to initialise the detachment")
return stacktrace.Propagate(err, "failed to initialise the detachment")
}
for i, volumeID := range targetEBSVolumeIDList {
//Wait for ebs volume detachment
log.Infof("[Wait]: Wait for EBS volume detachment for volume %v", volumeID)
if err := ebs.WaitForVolumeDetachment(volumeID, ec2InstanceIDList[i], experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to detach the ebs volume to the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ebs detachment failed")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -157,7 +173,7 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Getting the EBS volume attachment status
ebsState, err := ebs.GetEBSStatus(volumeID, ec2InstanceIDList[i], experimentsDetails.Region)
if err != nil {
return errors.Errorf("failed to get the ebs status, err: %v", err)
return stacktrace.Propagate(err, "failed to get the ebs status")
}
switch ebsState {
@ -167,13 +183,13 @@ func InjectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Attaching the ebs volume from the instance
log.Info("[Chaos]: Attaching the EBS volume from the instance")
if err = ebs.EBSVolumeAttach(volumeID, ec2InstanceIDList[i], deviceList[i], experimentsDetails.Region); err != nil {
return errors.Errorf("ebs attachment failed, err: %v", err)
return stacktrace.Propagate(err, "ebs attachment failed")
}
//Wait for ebs volume attachment
log.Infof("[Wait]: Wait for EBS volume attachment for volume %v", volumeID)
if err = ebs.WaitForVolumeAttachment(volumeID, ec2InstanceIDList[i], experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to attach the ebs volume to the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ebs attachment failed")
}
}
common.SetTargets(volumeID, "reverted", "EBS", chaosDetails)
@ -193,13 +209,13 @@ func AbortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, volumeI
//Get volume attachment details
instanceID, deviceName, err := ebs.GetVolumeAttachmentDetails(volumeID, experimentsDetails.VolumeTag, experimentsDetails.Region)
if err != nil {
log.Errorf("fail to get the attachment info, err: %v", err)
log.Errorf("Failed to get the attachment info: %v", err)
}
//Getting the EBS volume attachment status
ebsState, err := ebs.GetEBSStatus(experimentsDetails.EBSVolumeID, instanceID, experimentsDetails.Region)
if err != nil {
log.Errorf("failed to get the ebs status when an abort signal is received, err: %v", err)
log.Errorf("Failed to get the ebs status when an abort signal is received: %v", err)
}
if ebsState != "attached" {
@ -207,13 +223,13 @@ func AbortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, volumeI
//We first wait for the volume to get in detached state then we are attaching it.
log.Info("[Abort]: Wait for EBS complete volume detachment")
if err = ebs.WaitForVolumeDetachment(experimentsDetails.EBSVolumeID, instanceID, experimentsDetails.Region, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
log.Errorf("unable to detach the ebs volume, err: %v", err)
log.Errorf("Unable to detach the ebs volume: %v", err)
}
//Attaching the ebs volume from the instance
log.Info("[Chaos]: Attaching the EBS volume from the instance")
err = ebs.EBSVolumeAttach(experimentsDetails.EBSVolumeID, instanceID, deviceName, experimentsDetails.Region)
if err != nil {
log.Errorf("ebs attachment failed when an abort signal is received, err: %v", err)
log.Errorf("EBS attachment failed when an abort signal is received: %v", err)
}
}
common.SetTargets(volumeID, "reverted", "EBS", chaosDetails)

View File

@ -1,21 +1,26 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ec2-terminate-by-id/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var (
@ -23,8 +28,10 @@ var (
inject, abort chan os.Signal
)
//PrepareEC2TerminateByID contains the prepration and injection steps for the experiment
func PrepareEC2TerminateByID(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareEC2TerminateByID contains the prepration and injection steps for the experiment
func PrepareEC2TerminateByID(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEC2TerminateFaultByID")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -44,8 +51,8 @@ func PrepareEC2TerminateByID(experimentsDetails *experimentTypes.ExperimentDetai
//get the instance id or list of instance ids
instanceIDList := strings.Split(experimentsDetails.Ec2InstanceID, ",")
if len(instanceIDList) == 0 {
return errors.Errorf("no instance id found to terminate")
if experimentsDetails.Ec2InstanceID == "" || len(instanceIDList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no EC2 instance ID found to terminate"}
}
// watching for the abort signal and revert the chaos
@ -53,15 +60,15 @@ func PrepareEC2TerminateByID(experimentsDetails *experimentTypes.ExperimentDetai
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -72,8 +79,10 @@ func PrepareEC2TerminateByID(experimentsDetails *experimentTypes.ExperimentDetai
return nil
}
//injectChaosInSerialMode will inject the ec2 instance termination in serial mode that is one after other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInSerialMode will inject the ec2 instance termination in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByIDInSerialMode")
defer span.End()
select {
case <-inject:
@ -100,7 +109,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to stop, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "injected", "EC2", chaosDetails)
@ -108,14 +117,14 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to stop the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -127,13 +136,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
if experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to start, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
//Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to start the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
common.SetTargets(id, "reverted", "EC2", chaosDetails)
@ -145,7 +154,9 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode will inject the ec2 instance termination in parallel mode that is all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByIDInParallelMode")
defer span.End()
select {
case <-inject:
@ -171,7 +182,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to stop, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "injected", "EC2", chaosDetails)
}
@ -180,15 +191,15 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to stop the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "reverted", "EC2 Instance ID", chaosDetails)
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -202,7 +213,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for _, id := range instanceIDList {
log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to start, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
@ -210,7 +221,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to start the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
}
@ -232,19 +243,19 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanc
for _, id := range instanceIDList {
instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region)
if err != nil {
log.Errorf("fail to get instance status when an abort signal is received,err :%v", err)
log.Errorf("Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "running" && experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Abort]: Waiting for the EC2 instance to get down")
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
log.Errorf("unable to wait till stop of the instance, err: %v", err)
log.Errorf("Unable to wait till stop of the instance: %v", err)
}
log.Info("[Abort]: Starting EC2 instance as abort signal received")
err := awslib.EC2Start(id, experimentsDetails.Region)
if err != nil {
log.Errorf("ec2 instance failed to start when an abort signal is received, err: %v", err)
log.Errorf("EC2 instance failed to start when an abort signal is received: %v", err)
}
}
common.SetTargets(id, "reverted", "EC2", chaosDetails)

View File

@ -1,28 +1,35 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/ec2-terminate-by-tag/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
var inject, abort chan os.Signal
//PrepareEC2TerminateByTag contains the prepration and injection steps for the experiment
func PrepareEC2TerminateByTag(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareEC2TerminateByTag contains the prepration and injection steps for the experiment
func PrepareEC2TerminateByTag(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareAWSEC2TerminateFaultByTag")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -48,15 +55,15 @@ func PrepareEC2TerminateByTag(experimentsDetails *experimentTypes.ExperimentDeta
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err := injectChaosInSerialMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err := injectChaosInParallelMode(ctx, experimentsDetails, instanceIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -67,8 +74,10 @@ func PrepareEC2TerminateByTag(experimentsDetails *experimentTypes.ExperimentDeta
return nil
}
//injectChaosInSerialMode will inject the ce2 instance termination in serial mode that is one after other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// injectChaosInSerialMode will inject the ce2 instance termination in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByTagInSerialMode")
defer span.End()
select {
case <-inject:
@ -95,7 +104,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to stop, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "injected", "EC2", chaosDetails)
@ -103,14 +112,14 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to stop the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -122,13 +131,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
if experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to start, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
//Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to start the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
common.SetTargets(id, "reverted", "EC2", chaosDetails)
@ -140,7 +149,9 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode will inject the ce2 instance termination in parallel mode that is all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectAWSEC2TerminateFaultByTagInParallelMode")
defer span.End()
select {
case <-inject:
@ -165,7 +176,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Stopping the EC2 instance
log.Info("[Chaos]: Stopping the desired EC2 instance")
if err := awslib.EC2Stop(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to stop, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
common.SetTargets(id, "injected", "EC2", chaosDetails)
}
@ -174,14 +185,14 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for ec2 instance to completely stop
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in stopped state", id)
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to stop the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to stop")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -195,7 +206,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for _, id := range instanceIDList {
log.Info("[Chaos]: Starting back the EC2 instance")
if err := awslib.EC2Start(id, experimentsDetails.Region); err != nil {
return errors.Errorf("ec2 instance failed to start, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
@ -203,7 +214,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for ec2 instance to get in running state
log.Infof("[Wait]: Wait for EC2 instance '%v' to get in running state", id)
if err := awslib.WaitForEC2Up(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
return errors.Errorf("unable to start the ec2 instance, err: %v", err)
return stacktrace.Propagate(err, "ec2 instance failed to start")
}
}
}
@ -216,21 +227,24 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
return nil
}
//SetTargetInstance will select the target instance which are in running state and filtered from the given instance tag
// SetTargetInstance will select the target instance which are in running state and filtered from the given instance tag
func SetTargetInstance(experimentsDetails *experimentTypes.ExperimentDetails) error {
instanceIDList, err := awslib.GetInstanceList(experimentsDetails.InstanceTag, experimentsDetails.Region)
instanceIDList, err := awslib.GetInstanceList(experimentsDetails.Ec2InstanceTag, experimentsDetails.Region)
if err != nil {
return err
return stacktrace.Propagate(err, "failed to get the instance id list")
}
if len(instanceIDList) == 0 {
return errors.Errorf("no instance found with the given tag %v, in region %v", experimentsDetails.InstanceTag, experimentsDetails.Region)
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeTargetSelection,
Reason: fmt.Sprintf("no instance found with the given tag %v, in region %v", experimentsDetails.Ec2InstanceTag, experimentsDetails.Region),
}
}
for _, id := range instanceIDList {
instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region)
if err != nil {
return errors.Errorf("fail to get the instance status while selecting the target instances, err: %v", err)
return stacktrace.Propagate(err, "failed to get the instance status while selecting the target instances")
}
if instanceState == "running" {
experimentsDetails.TargetInstanceIDList = append(experimentsDetails.TargetInstanceIDList, id)
@ -238,7 +252,10 @@ func SetTargetInstance(experimentsDetails *experimentTypes.ExperimentDetails) er
}
if len(experimentsDetails.TargetInstanceIDList) == 0 {
return errors.Errorf("fail to get any running instance having instance tag: %v", experimentsDetails.InstanceTag)
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: "failed to get any running instance",
Target: fmt.Sprintf("EC2 Instance Tag: %v", experimentsDetails.Ec2InstanceTag)}
}
log.InfoWithValues("[Info]: Targeting the running instances filtered from instance tag", logrus.Fields{
@ -257,19 +274,19 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanc
for _, id := range instanceIDList {
instanceState, err := awslib.GetEC2InstanceStatus(id, experimentsDetails.Region)
if err != nil {
log.Errorf("fail to get instance status when an abort signal is received,err :%v", err)
log.Errorf("Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "running" && experimentsDetails.ManagedNodegroup != "enable" {
log.Info("[Abort]: Waiting for the EC2 instance to get down")
if err := awslib.WaitForEC2Down(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.ManagedNodegroup, experimentsDetails.Region, id); err != nil {
log.Errorf("unable to wait till stop of the instance, err: %v", err)
log.Errorf("Unable to wait till stop of the instance: %v", err)
}
log.Info("[Abort]: Starting EC2 instance as abort signal received")
err := awslib.EC2Start(id, experimentsDetails.Region)
if err != nil {
log.Errorf("ec2 instance failed to start when an abort signal is received, err: %v", err)
log.Errorf("EC2 instance failed to start when an abort signal is received: %v", err)
}
}
common.SetTargets(id, "reverted", "EC2", chaosDetails)

View File

@ -1,21 +1,26 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-disk-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1"
)
@ -25,7 +30,9 @@ var (
)
// PrepareDiskVolumeLossByLabel contains the prepration and injection steps for the experiment
func PrepareDiskVolumeLossByLabel(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareDiskVolumeLossByLabel(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareGCPDiskVolumeLossFaultByLabel")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -61,15 +68,15 @@ func PrepareDiskVolumeLossByLabel(computeService *compute.Service, experimentsDe
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, computeService, experimentsDetails, diskVolumeNamesList, experimentsDetails.TargetDiskInstanceNamesList, experimentsDetails.Zones, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
}
@ -83,7 +90,9 @@ func PrepareDiskVolumeLossByLabel(computeService *compute.Service, experimentsDe
}
// injectChaosInSerialMode will inject the disk loss chaos in serial mode which means one after the other
func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPDiskVolumeLossFaultByLabelInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
@ -102,7 +111,7 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Detaching the disk volume from the instance
log.Info("[Chaos]: Detaching the disk volume from the instance")
if err = gcp.DiskVolumeDetach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i]); err != nil {
return errors.Errorf("disk detachment failed, err: %v", err)
return stacktrace.Propagate(err, "disk detachment failed")
}
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
@ -110,13 +119,13 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Wait for disk volume detachment
log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to detach the disk volume from the vm instance, err: %v", err)
return stacktrace.Propagate(err, "unable to detach the disk volume from the vm instance")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -128,7 +137,7 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone)
if err != nil {
return errors.Errorf("failed to get the disk volume status, err: %v", err)
return stacktrace.Propagate(err, "failed to get the disk volume status")
}
switch diskState {
@ -138,13 +147,13 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Attaching the disk volume to the instance
log.Info("[Chaos]: Attaching the disk volume back to the instance")
if err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return errors.Errorf("disk attachment failed, err: %v", err)
return stacktrace.Propagate(err, "disk attachment failed")
}
//Wait for disk volume attachment
log.Infof("[Wait]: Wait for disk volume attachment for %v volume", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to attach the disk volume to the vm instance, err: %v", err)
return stacktrace.Propagate(err, "unable to attach the disk volume to the vm instance")
}
}
@ -158,7 +167,9 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
}
// injectChaosInParallelMode will inject the disk loss chaos in parallel mode that means all at once
func injectChaosInParallelMode(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, instanceNamesList []string, zone string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPDiskVolumeLossFaultByLabelInParallelMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
@ -177,7 +188,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
//Detaching the disk volume from the instance
log.Info("[Chaos]: Detaching the disk volume from the instance")
if err = gcp.DiskVolumeDetach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i]); err != nil {
return errors.Errorf("disk detachment failed, err: %v", err)
return stacktrace.Propagate(err, "disk detachment failed")
}
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
@ -188,13 +199,13 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
//Wait for disk volume detachment
log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to detach the disk volume from the vm instance, err: %v", err)
return stacktrace.Propagate(err, "unable to detach the disk volume from the vm instance")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -208,7 +219,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone)
if err != nil {
return errors.Errorf("failed to get the disk status, err: %v", err)
return stacktrace.Propagate(err, "failed to get the disk status")
}
switch diskState {
@ -218,13 +229,13 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
//Attaching the disk volume to the instance
log.Info("[Chaos]: Attaching the disk volume to the instance")
if err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return errors.Errorf("disk attachment failed, err: %v", err)
return stacktrace.Propagate(err, "disk attachment failed")
}
//Wait for disk volume attachment
log.Infof("[Wait]: Wait for disk volume attachment for volume %v", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to attach the disk volume to the vm instance, err: %v", err)
return stacktrace.Propagate(err, "unable to attach the disk volume to the vm instance")
}
}
@ -249,25 +260,25 @@ func abortWatcher(computeService *compute.Service, experimentsDetails *experimen
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone)
if err != nil {
log.Errorf("failed to get the disk state when an abort signal is received, err: %v", err)
log.Errorf("Failed to get %s disk state when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
}
if diskState != "attached" {
//Wait for disk volume detachment
//We first wait for the volume to get in detached state then we are attaching it.
log.Info("[Abort]: Wait for complete disk volume detachment")
log.Infof("[Abort]: Wait for %s complete disk volume detachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, instanceNamesList[i], zone, experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
log.Errorf("unable to detach the disk volume, err: %v", err)
log.Errorf("Unable to detach %s disk volume, err: %v", targetDiskVolumeNamesList[i], err)
}
//Attaching the disk volume from the instance
log.Info("[Chaos]: Attaching the disk volume from the instance")
log.Infof("[Chaos]: Attaching %s disk volume to the instance", targetDiskVolumeNamesList[i])
err = gcp.DiskVolumeAttach(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zone, experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i])
if err != nil {
log.Errorf("disk attachment failed when an abort signal is received, err: %v", err)
log.Errorf("%s disk attachment failed when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
}
}
@ -285,12 +296,12 @@ func getDeviceNamesAndVMInstanceNames(diskVolumeNamesList []string, computeServi
instanceName, err := gcp.GetVolumeAttachmentDetails(computeService, experimentsDetails.GCPProjectID, experimentsDetails.Zones, diskVolumeNamesList[i])
if err != nil || instanceName == "" {
return errors.Errorf("failed to get the attachment info, err: %v", err)
return stacktrace.Propagate(err, "failed to get the disk attachment info")
}
deviceName, err := gcp.GetDiskDeviceNameForVM(computeService, diskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones, instanceName)
if err != nil {
return err
return stacktrace.Propagate(err, "failed to fetch the disk device name")
}
experimentsDetails.TargetDiskInstanceNamesList = append(experimentsDetails.TargetDiskInstanceNamesList, instanceName)

View File

@ -1,21 +1,27 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
gcp "github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-disk-loss/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"github.com/pkg/errors"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1"
)
@ -25,7 +31,9 @@ var (
)
// PrepareDiskVolumeLoss contains the prepration and injection steps for the experiment
func PrepareDiskVolumeLoss(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareDiskVolumeLoss(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareVMDiskLossFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -51,7 +59,7 @@ func PrepareDiskVolumeLoss(computeService *compute.Service, experimentsDetails *
//get the device names for the given disks
if err := getDeviceNamesList(computeService, experimentsDetails, diskNamesList, diskZonesList); err != nil {
return err
return stacktrace.Propagate(err, "failed to fetch the disk device names")
}
select {
@ -65,15 +73,15 @@ func PrepareDiskVolumeLoss(computeService *compute.Service, experimentsDetails *
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(computeService, experimentsDetails, diskNamesList, diskZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, computeService, experimentsDetails, diskNamesList, diskZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(computeService, experimentsDetails, diskNamesList, diskZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, computeService, experimentsDetails, diskNamesList, diskZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
}
@ -87,8 +95,9 @@ func PrepareDiskVolumeLoss(computeService *compute.Service, experimentsDetails *
}
// injectChaosInSerialMode will inject the disk loss chaos in serial mode which means one after the other
func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMDiskLossFaultInSerialMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
@ -103,23 +112,23 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
for i := range targetDiskVolumeNamesList {
//Detaching the disk volume from the instance
log.Info("[Chaos]: Detaching the disk volume from the instance")
log.Infof("[Chaos]: Detaching %s disk volume from the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeDetach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i]); err != nil {
return errors.Errorf("disk detachment failed, err: %v", err)
return stacktrace.Propagate(err, "disk detachment failed")
}
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
//Wait for disk volume detachment
log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i])
log.Infof("[Wait]: Wait for %s disk volume detachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to detach the disk volume from the vm instance, err: %v", err)
return stacktrace.Propagate(err, "unable to detach disk volume from the vm instance")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -131,23 +140,23 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i])
if err != nil {
return errors.Errorf("failed to get the disk volume status, err: %v", err)
return stacktrace.Propagate(err, fmt.Sprintf("failed to get %s disk volume status", targetDiskVolumeNamesList[i]))
}
switch diskState {
case "attached":
log.Info("[Skip]: The disk volume is already attached")
log.Infof("[Skip]: %s disk volume is already attached", targetDiskVolumeNamesList[i])
default:
//Attaching the disk volume to the instance
log.Info("[Chaos]: Attaching the disk volume back to the instance")
log.Infof("[Chaos]: Attaching %s disk volume back to the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return errors.Errorf("disk attachment failed, err: %v", err)
return stacktrace.Propagate(err, "disk attachment failed")
}
//Wait for disk volume attachment
log.Infof("[Wait]: Wait for disk volume attachment for %v volume", targetDiskVolumeNamesList[i])
log.Infof("[Wait]: Wait for %s disk volume attachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to attach the disk volume to the vm instance, err: %v", err)
return stacktrace.Propagate(err, "unable to attach disk volume to the vm instance")
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
@ -158,7 +167,9 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
}
// injectChaosInParallelMode will inject the disk loss chaos in parallel mode that means all at once
func injectChaosInParallelMode(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, targetDiskVolumeNamesList, diskZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMDiskLossFaultInParallelMode")
defer span.End()
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
@ -175,9 +186,9 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
for i := range targetDiskVolumeNamesList {
//Detaching the disk volume from the instance
log.Info("[Chaos]: Detaching the disk volume from the instance")
log.Infof("[Chaos]: Detaching %s disk volume from the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeDetach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i]); err != nil {
return errors.Errorf("disk detachment failed, err: %v", err)
return stacktrace.Propagate(err, "disk detachment failed")
}
common.SetTargets(targetDiskVolumeNamesList[i], "injected", "DiskVolume", chaosDetails)
@ -186,15 +197,15 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
for i := range targetDiskVolumeNamesList {
//Wait for disk volume detachment
log.Infof("[Wait]: Wait for disk volume detachment for volume %v", targetDiskVolumeNamesList[i])
log.Infof("[Wait]: Wait for %s disk volume detachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to detach the disk volume from the vm instance, err: %v", err)
return stacktrace.Propagate(err, "unable to detach disk volume from the vm instance")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -213,18 +224,18 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
switch diskState {
case "attached":
log.Info("[Skip]: The disk volume is already attached")
log.Infof("[Skip]: %s disk volume is already attached", targetDiskVolumeNamesList[i])
default:
//Attaching the disk volume to the instance
log.Info("[Chaos]: Attaching the disk volume to the instance")
log.Infof("[Chaos]: Attaching %s disk volume to the instance", targetDiskVolumeNamesList[i])
if err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i]); err != nil {
return errors.Errorf("disk attachment failed, err: %v", err)
return stacktrace.Propagate(err, "disk attachment failed")
}
//Wait for disk volume attachment
log.Infof("[Wait]: Wait for disk volume attachment for volume %v", targetDiskVolumeNamesList[i])
log.Infof("[Wait]: Wait for %s disk volume attachment", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeAttachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
return errors.Errorf("unable to attach the disk volume to the vm instance, err: %v", err)
return stacktrace.Propagate(err, "unable to attach disk volume to the vm instance")
}
}
common.SetTargets(targetDiskVolumeNamesList[i], "reverted", "DiskVolume", chaosDetails)
@ -246,25 +257,25 @@ func abortWatcher(computeService *compute.Service, experimentsDetails *experimen
//Getting the disk volume attachment status
diskState, err := gcp.GetDiskVolumeState(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i])
if err != nil {
log.Errorf("failed to get the disk state when an abort signal is received, err: %v", err)
log.Errorf("Failed to get %s disk state when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
}
if diskState != "attached" {
//Wait for disk volume detachment
//We first wait for the volume to get in detached state then we are attaching it.
log.Info("[Abort]: Wait for complete disk volume detachment")
log.Infof("[Abort]: Wait for complete disk volume detachment for %s", targetDiskVolumeNamesList[i])
if err = gcp.WaitForVolumeDetachment(computeService, targetDiskVolumeNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.TargetDiskInstanceNamesList[i], diskZonesList[i], experimentsDetails.Delay, experimentsDetails.Timeout); err != nil {
log.Errorf("unable to detach the disk volume, err: %v", err)
log.Errorf("Unable to detach %s disk volume, err: %v", targetDiskVolumeNamesList[i], err)
}
//Attaching the disk volume from the instance
log.Info("[Chaos]: Attaching the disk volume from the instance")
log.Infof("[Chaos]: Attaching %s disk volume from the instance", targetDiskVolumeNamesList[i])
err = gcp.DiskVolumeAttach(computeService, experimentsDetails.TargetDiskInstanceNamesList[i], experimentsDetails.GCPProjectID, diskZonesList[i], experimentsDetails.DeviceNamesList[i], targetDiskVolumeNamesList[i])
if err != nil {
log.Errorf("disk attachment failed when an abort signal is received, err: %v", err)
log.Errorf("%s disk attachment failed when an abort signal is received, err: %v", targetDiskVolumeNamesList[i], err)
}
}

View File

@ -1,28 +1,35 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
gcplib "github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1"
)
var inject, abort chan os.Signal
// PrepareVMStopByLabel executes the experiment steps by injecting chaos into target VM instances
func PrepareVMStopByLabel(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareVMStopByLabel(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareGCPVMInstanceStopFaultByLabel")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -48,15 +55,15 @@ func PrepareVMStopByLabel(computeService *compute.Service, experimentsDetails *e
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(computeService, experimentsDetails, instanceNamesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err := injectChaosInSerialMode(ctx, computeService, experimentsDetails, instanceNamesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(computeService, experimentsDetails, instanceNamesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err := injectChaosInParallelMode(ctx, computeService, experimentsDetails, instanceNamesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -69,7 +76,9 @@ func PrepareVMStopByLabel(computeService *compute.Service, experimentsDetails *e
}
// injectChaosInSerialMode stops VM instances in serial mode i.e. one after the other
func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPVMInstanceStopFaultByLabelInSerialMode")
defer span.End()
select {
case <-inject:
@ -96,7 +105,7 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("VM instance failed to stop, err: %v", err)
return stacktrace.Propagate(err, "VM instance failed to stop")
}
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
@ -104,13 +113,13 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to stop", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("%s vm instance failed to fully shutdown, err: %v", instanceNamesList[i], err)
return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -125,7 +134,7 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in RUNNING state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("unable to start %s vm instance, err: %v", instanceNamesList[i], err)
return stacktrace.Propagate(err, "unable to start %s vm instance")
}
default:
@ -133,13 +142,13 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
// starting the VM instance
log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("%s vm instance failed to start, err: %v", instanceNamesList[i], err)
return stacktrace.Propagate(err, "vm instance failed to start")
}
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in RUNNING state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("unable to start %s vm instance, err: %v", instanceNamesList[i], err)
return stacktrace.Propagate(err, "unable to start %s vm instance")
}
}
@ -154,8 +163,9 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
}
// injectChaosInParallelMode will inject the VM instance termination in serial mode that is one after other
func injectChaosInParallelMode(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectGCPVMInstanceStopFaultByLabelInParallelMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
@ -181,7 +191,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
// stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("%s vm instance failed to stop, err: %v", instanceNamesList[i], err)
return stacktrace.Propagate(err, "vm instance failed to stop")
}
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
@ -192,13 +202,13 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
// wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("%s vm instance failed to fully shutdown, err: %v", instanceNamesList[i], err)
return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -215,7 +225,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
log.Infof("[Wait]: Wait for VM instance '%v' to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("unable to start the vm instance, err: %v", err)
return stacktrace.Propagate(err, "unable to start the vm instance")
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
@ -228,7 +238,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
log.Info("[Chaos]: Starting back the VM instance")
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("vm instance failed to start, err: %v", err)
return stacktrace.Propagate(err, "vm instance failed to start")
}
}
@ -237,7 +247,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
log.Infof("[Wait]: Wait for VM instance '%v' to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
return errors.Errorf("unable to start the vm instance, err: %v", err)
return stacktrace.Propagate(err, "unable to start the vm instance")
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
@ -260,19 +270,19 @@ func abortWatcher(computeService *compute.Service, experimentsDetails *experimen
for i := range instanceNamesList {
instanceState, err := gcplib.GetVMInstanceStatus(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones)
if err != nil {
log.Errorf("fail to get instance status when an abort signal is received,err :%v", err)
log.Errorf("Failed to get %s instance status when an abort signal is received, err: %v", instanceNamesList[i], err)
}
if instanceState != "RUNNING" && experimentsDetails.ManagedInstanceGroup != "enable" {
log.Info("[Abort]: Waiting for the VM instance to shut down")
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones); err != nil {
log.Errorf("unable to wait till stop of the instance, err: %v", err)
log.Errorf("Unable to wait till stop of %s instance, err: %v", instanceNamesList[i], err)
}
log.Info("[Abort]: Starting VM instance as abort signal received")
err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, experimentsDetails.Zones)
if err != nil {
log.Errorf("vm instance failed to start when an abort signal is received, err: %v", err)
log.Errorf("%s instance failed to start when an abort signal is received, err: %v", instanceNamesList[i], err)
}
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)

View File

@ -1,21 +1,26 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
gcplib "github.com/litmuschaos/litmus-go/pkg/cloud/gcp"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/gcp/gcp-vm-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"google.golang.org/api/compute/v1"
)
@ -25,7 +30,9 @@ var (
)
// PrepareVMStop contains the prepration and injection steps for the experiment
func PrepareVMStop(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareVMStop(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareVMInstanceStopFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -53,15 +60,15 @@ func PrepareVMStop(computeService *compute.Service, experimentsDetails *experime
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(computeService, experimentsDetails, instanceNamesList, instanceZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, computeService, experimentsDetails, instanceNamesList, instanceZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(computeService, experimentsDetails, instanceNamesList, instanceZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, computeService, experimentsDetails, instanceNamesList, instanceZonesList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// wait for the ramp time after chaos injection
@ -74,7 +81,9 @@ func PrepareVMStop(computeService *compute.Service, experimentsDetails *experime
}
// injectChaosInSerialMode stops VM instances in serial mode i.e. one after the other
func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, instanceZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInSerialMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, instanceZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMInstanceStopFaultInSerialMode")
defer span.End()
select {
case <-inject:
@ -101,7 +110,7 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("%s VM instance failed to stop, err: %v", instanceNamesList[i], err)
return stacktrace.Propagate(err, "vm instance failed to stop")
}
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
@ -109,13 +118,13 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
//Wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("%s vm instance failed to fully shutdown, err: %v", instanceNamesList[i], err)
return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
}
// run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -130,13 +139,13 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
// starting the VM instance
log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("%s vm instance failed to start, err: %v", instanceNamesList[i], err)
return stacktrace.Propagate(err, "vm instance failed to start")
}
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("unable to start %s vm instance, err: %v", instanceNamesList[i], err)
return stacktrace.Propagate(err, "unable to start vm instance")
}
default:
@ -144,7 +153,7 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
// wait for VM instance to get in running state
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("unable to start %s vm instance, err: %v", instanceNamesList[i], err)
return stacktrace.Propagate(err, "unable to start vm instance")
}
}
@ -159,7 +168,9 @@ func injectChaosInSerialMode(computeService *compute.Service, experimentsDetails
}
// injectChaosInParallelMode stops VM instances in parallel mode i.e. all at once
func injectChaosInParallelMode(computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, instanceZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, computeService *compute.Service, experimentsDetails *experimentTypes.ExperimentDetails, instanceNamesList []string, instanceZonesList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectVMInstanceStopFaultInParallelMode")
defer span.End()
select {
case <-inject:
@ -186,7 +197,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
// stopping the VM instance
log.Infof("[Chaos]: Stopping %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStop(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("%s vm instance failed to stop, err: %v", instanceNamesList[i], err)
return stacktrace.Propagate(err, "vm instance failed to stop")
}
common.SetTargets(instanceNamesList[i], "injected", "VM", chaosDetails)
@ -197,13 +208,13 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
// wait for VM instance to completely stop
log.Infof("[Wait]: Wait for VM instance %s to get in stopped state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("%s vm instance failed to fully shutdown, err: %v", instanceNamesList[i], err)
return stacktrace.Propagate(err, "vm instance failed to fully shutdown")
}
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -219,7 +230,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
for i := range instanceNamesList {
log.Infof("[Chaos]: Starting back %s VM instance", instanceNamesList[i])
if err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("%s vm instance failed to start, err: %v", instanceNamesList[i], err)
return stacktrace.Propagate(err, "vm instance failed to start")
}
}
@ -228,7 +239,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("unable to start %s vm instance, err: %v", instanceNamesList[i], err)
return stacktrace.Propagate(err, "unable to start vm instance")
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
@ -241,7 +252,7 @@ func injectChaosInParallelMode(computeService *compute.Service, experimentsDetai
log.Infof("[Wait]: Wait for VM instance %s to get in running state", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceUp(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, instanceZonesList[i]); err != nil {
return errors.Errorf("unable to start %s vm instance, err: %v", instanceNamesList[i], err)
return stacktrace.Propagate(err, "unable to start vm instance")
}
common.SetTargets(instanceNamesList[i], "reverted", "VM", chaosDetails)
@ -267,20 +278,20 @@ func abortWatcher(computeService *compute.Service, experimentsDetails *experimen
instanceState, err := gcplib.GetVMInstanceStatus(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i])
if err != nil {
log.Errorf("failed to get %s vm instance status when an abort signal is received, err: %v", instanceNamesList[i], err)
log.Errorf("Failed to get %s vm instance status when an abort signal is received, err: %v", instanceNamesList[i], err)
}
if instanceState != "RUNNING" {
log.Infof("[Abort]: Waiting for %s VM instance to shut down", instanceNamesList[i])
if err := gcplib.WaitForVMInstanceDown(computeService, experimentsDetails.Timeout, experimentsDetails.Delay, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i]); err != nil {
log.Errorf("unable to wait till stop of the instance, err: %v", err)
log.Errorf("Unable to wait till stop of %s instance, err: %v", instanceNamesList[i], err)
}
log.Infof("[Abort]: Starting %s VM instance as abort signal is received", instanceNamesList[i])
err := gcplib.VMInstanceStart(computeService, instanceNamesList[i], experimentsDetails.GCPProjectID, zonesList[i])
if err != nil {
log.Errorf("%s vm instance failed to start when an abort signal is received, err: %v", instanceNamesList[i], err)
log.Errorf("%s VM instance failed to start when an abort signal is received, err: %v", instanceNamesList[i], err)
}
}

View File

@ -1,12 +1,16 @@
package helper
import (
"bytes"
"context"
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"os"
"os/exec"
"os/signal"
"strconv"
"strings"
"syscall"
"time"
@ -17,7 +21,6 @@ import (
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
clientTypes "k8s.io/apimachinery/pkg/types"
)
@ -27,7 +30,9 @@ var (
)
// Helper injects the http chaos
func Helper(clients clients.ClientSets) {
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodHTTPFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
@ -48,10 +53,11 @@ func Helper(clients clients.ClientSets) {
log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails)
// Intialise the chaos attributes
// Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters
// Initialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
// Set the chaos result uid
@ -59,22 +65,67 @@ func Helper(clients clients.ClientSets) {
err := prepareK8sHttpChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails)
if err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
// prepareK8sHttpChaos contains the prepration steps before chaos injection
// prepareK8sHttpChaos contains the preparation steps before chaos injection
func prepareK8sHttpChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
containerID, err := common.GetRuntimeBasedContainerID(experimentsDetails.ContainerRuntime, experimentsDetails.SocketPath, experimentsDetails.TargetPods, experimentsDetails.AppNS, experimentsDetails.TargetContainer, clients)
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return err
return stacktrace.Propagate(err, "could not parse targets")
}
// extract out the pid of the target container
targetPID, err := common.GetPauseAndSandboxPID(experimentsDetails.ContainerRuntime, containerID, experimentsDetails.SocketPath)
if err != nil {
return err
var targets []targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
td.ContainerId, err = common.GetRuntimeBasedContainerID(experimentsDetails.ContainerRuntime, experimentsDetails.SocketPath, td.Name, td.Namespace, td.TargetContainer, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the pid of the target container
td.Pid, err = common.GetPauseAndSandboxPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
}
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos
go abortWatcher(targets, resultDetails.Name, chaosDetails.ChaosNamespace, experimentsDetails)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
for _, t := range targets {
// injecting http chaos inside target container
if err = injectChaos(experimentsDetails, t); err != nil {
return stacktrace.Propagate(err, "could not inject chaos")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertChaos(experimentsDetails, t); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
// record the event inside chaosengine
@ -84,71 +135,67 @@ func prepareK8sHttpChaos(experimentsDetails *experimentTypes.ExperimentDetails,
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// watching for the abort signal and revert the chaos
go abortWatcher(targetPID, resultDetails.Name, chaosDetails.ChaosNamespace, experimentsDetails)
// injecting http chaos inside target container
if err = injectChaos(experimentsDetails, targetPID); err != nil {
return err
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Stopping the experiment")
log.Info("[Chaos]: chaos duration is over, reverting chaos")
// cleaning the netem process after chaos injection
if err = revertChaos(experimentsDetails, targetPID); err != nil {
return err
var errList []string
for _, t := range targets {
// cleaning the ip rules process after chaos injection
err := revertChaos(experimentsDetails, t)
if err != nil {
errList = append(errList, err.Error())
continue
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
return result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods)
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// injectChaos inject the http chaos in target container and add ruleset to the iptables to redirect the ports
func injectChaos(experimentDetails *experimentTypes.ExperimentDetails, pid int) error {
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
// proceed for chaos injection
if err := startProxy(experimentDetails, pid); err != nil {
_ = killProxy(pid)
return errors.Errorf("failed to start proxy, err: %v", err)
func injectChaos(experimentDetails *experimentTypes.ExperimentDetails, t targetDetails) error {
if err := startProxy(experimentDetails, t.Pid); err != nil {
killErr := killProxy(t.Pid, t.Source)
if killErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(killErr).Error())}
}
if err := addIPRuleSet(experimentDetails, pid); err != nil {
_ = killProxy(pid)
return errors.Errorf("failed to add ip rule set, err: %v", err)
return stacktrace.Propagate(err, "could not start proxy server")
}
if err := addIPRuleSet(experimentDetails, t.Pid); err != nil {
killErr := killProxy(t.Pid, t.Source)
if killErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(killErr).Error())}
}
return stacktrace.Propagate(err, "could not add ip rules")
}
return nil
}
// revertChaos revert the http chaos in target container
func revertChaos(experimentDetails *experimentTypes.ExperimentDetails, pid int) error {
func revertChaos(experimentDetails *experimentTypes.ExperimentDetails, t targetDetails) error {
var revertError error
revertError = nil
var errList []string
if err := removeIPRuleSet(experimentDetails, pid); err != nil {
revertError = errors.Errorf("failed to remove ip rule set, err: %v", err)
if err := removeIPRuleSet(experimentDetails, t.Pid); err != nil {
errList = append(errList, err.Error())
}
if err := killProxy(pid); err != nil {
if revertError != nil {
revertError = errors.Errorf("%v and failed to kill proxy server, err: %v", revertError, err)
} else {
revertError = errors.Errorf("failed to kill proxy server, err: %v", err)
}
if err := killProxy(t.Pid, t.Source); err != nil {
errList = append(errList, err.Error())
}
return revertError
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
return nil
}
// startProxy starts the proxy process inside the target container
@ -169,7 +216,7 @@ func startProxy(experimentDetails *experimentTypes.ExperimentDetails, pid int) e
log.Infof("[Chaos]: Starting proxy server")
if err := runCommand(chaosCommand); err != nil {
if err := common.RunBashCommand(chaosCommand, "failed to start proxy server", experimentDetails.ChaosPodName); err != nil {
return err
}
@ -177,14 +224,16 @@ func startProxy(experimentDetails *experimentTypes.ExperimentDetails, pid int) e
return nil
}
const NoProxyToKill = "you need to specify whom to kill"
// killProxy kills the proxy process inside the target container
// it is using nsenter command to enter into network namespace of target container
// and execute the proxy related command inside it.
func killProxy(pid int) error {
stopProxyServerCommand := fmt.Sprintf("sudo nsenter -t %d -n sudo kill -9 $(ps aux | grep [t]oxiproxy | awk 'FNR==1{print $1}')", pid)
func killProxy(pid int, source string) error {
stopProxyServerCommand := fmt.Sprintf("sudo nsenter -t %d -n sudo kill -9 $(ps aux | grep [t]oxiproxy | awk 'FNR==2{print $2}')", pid)
log.Infof("[Chaos]: Stopping proxy server")
if err := runCommand(stopProxyServerCommand); err != nil {
if err := common.RunBashCommand(stopProxyServerCommand, "failed to stop proxy server", source); err != nil {
return err
}
@ -202,7 +251,7 @@ func addIPRuleSet(experimentDetails *experimentTypes.ExperimentDetails, pid int)
addIPRuleSetCommand := fmt.Sprintf("(sudo nsenter -t %d -n iptables -t nat -I PREROUTING -i %v -p tcp --dport %d -j REDIRECT --to-port %d)", pid, experimentDetails.NetworkInterface, experimentDetails.TargetServicePort, experimentDetails.ProxyPort)
log.Infof("[Chaos]: Adding IPtables ruleset")
if err := runCommand(addIPRuleSetCommand); err != nil {
if err := common.RunBashCommand(addIPRuleSetCommand, "failed to add ip rules", experimentDetails.ChaosPodName); err != nil {
return err
}
@ -210,6 +259,8 @@ func addIPRuleSet(experimentDetails *experimentTypes.ExperimentDetails, pid int)
return nil
}
const NoIPRulesetToRemove = "No chain/target/match by that name"
// removeIPRuleSet removes the ip rule set from iptables in target container
// it is using nsenter command to enter into network namespace of target container
// and execute the iptables related command inside it.
@ -217,7 +268,7 @@ func removeIPRuleSet(experimentDetails *experimentTypes.ExperimentDetails, pid i
removeIPRuleSetCommand := fmt.Sprintf("sudo nsenter -t %d -n iptables -t nat -D PREROUTING -i %v -p tcp --dport %d -j REDIRECT --to-port %d", pid, experimentDetails.NetworkInterface, experimentDetails.TargetServicePort, experimentDetails.ProxyPort)
log.Infof("[Chaos]: Removing IPtables ruleset")
if err := runCommand(removeIPRuleSetCommand); err != nil {
if err := common.RunBashCommand(removeIPRuleSetCommand, "failed to remove ip rules", experimentDetails.ChaosPodName); err != nil {
return err
}
@ -229,10 +280,6 @@ func removeIPRuleSet(experimentDetails *experimentTypes.ExperimentDetails, pid i
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.AppLabel = types.Getenv("APP_LABEL", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", ""))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
@ -246,27 +293,8 @@ func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.Toxicity, _ = strconv.Atoi(types.Getenv("TOXICITY", "100"))
}
func runCommand(chaosCommand string) error {
var stdout, stderr bytes.Buffer
cmd := exec.Command("/bin/bash", "-c", chaosCommand)
cmd.Stdout = &stdout
cmd.Stderr = &stderr
err = cmd.Run()
errStr := stderr.String()
if err != nil {
// if we get standard error then, return the same
if errStr != "" {
return errors.New(errStr)
}
// if not standard error found, return error
return err
}
return nil
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targetPID int, resultName, chaosNS string, experimentDetails *experimentTypes.ExperimentDetails) {
func abortWatcher(targets []targetDetails, resultName, chaosNS string, experimentDetails *experimentTypes.ExperimentDetails) {
<-abort
log.Info("[Abort]: Killing process started because of terminated signal received")
@ -274,23 +302,31 @@ func abortWatcher(targetPID int, resultName, chaosNS string, experimentDetails *
retry := 3
for retry > 0 {
if err = revertChaos(experimentDetails, targetPID); err != nil {
retry--
// If retries are left
if retry > 0 {
log.Errorf("[Abort]: Failed to revert chaos, retrying %d more times, err: %v", retry, err)
time.Sleep(1 * time.Second)
for _, t := range targets {
if err = revertChaos(experimentDetails, t); err != nil {
if strings.Contains(err.Error(), NoIPRulesetToRemove) && strings.Contains(err.Error(), NoProxyToKill) {
continue
}
log.Errorf("unable to revert for %v pod, err :%v", t.Name, err)
continue
}
// else exit with error
log.Errorf("[Abort]: Chaos Revert Failed")
os.Exit(1)
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult for %v pod, err :%v", t.Name, err)
}
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", experimentDetails.TargetPods); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
retry--
time.Sleep(1 * time.Second)
}
log.Info("Chaos Revert Completed")
os.Exit(1)
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
ContainerId string
Pid int
Source string
}

View File

@ -1,16 +1,22 @@
package header
import (
"context"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
//PodHttpModifyHeaderChaos contains the steps to prepare and inject http modify header chaos
func PodHttpModifyHeaderChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PodHttpModifyHeaderChaos contains the steps to prepare and inject http modify header chaos
func PodHttpModifyHeaderChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHTTPModifyHeaderFault")
defer span.End()
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
@ -27,5 +33,5 @@ func PodHttpModifyHeaderChaos(experimentsDetails *experimentTypes.ExperimentDeta
stream = "upstream"
}
args := "-t header --" + stream + " -a headers='" + (experimentsDetails.HeadersMap) + "' -a mode=" + experimentsDetails.HeaderMode
return http_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -2,64 +2,45 @@ package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error {
// PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error {
targetPodList := apiv1.PodList{}
var err error
var podsAffectedPerc int
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
//set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
podsAffectedPerc, _ = strconv.Atoi(experimentsDetails.PodsAffectedPerc)
if experimentsDetails.NodeLabel == "" {
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
} else {
if experimentsDetails.TargetPods == "" {
targetPodList, err = common.GetPodListFromSpecifiedNodes(experimentsDetails.TargetPods, podsAffectedPerc, experimentsDetails.NodeLabel, clients, chaosDetails)
if err != nil {
return err
}
} else {
log.Infof("TARGET_PODS env is provided, overriding the NODE_LABEL input")
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
}
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
var podNames []string
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("[Info]: Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
@ -70,42 +51,42 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err)
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "")
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, args, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, args, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, args, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, args, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode inject the http chaos in all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, args string, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, args string, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodHTTPFaultInSerialMode")
defer span.End()
var err error
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -115,10 +96,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
@ -126,33 +104,16 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, args, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for http chaos
log.Info("[Cleanup]: Deleting the the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pods, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
@ -160,79 +121,54 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode inject the http chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, args string, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, args string, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodHTTPFaultInParallelMode")
defer span.End()
labelSuffix := common.GetRunID()
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform http chaos
for _, pod := range targetPodList.Items {
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
//Get the target container name of the application pod
//It checks the empty target container for the first iteration only
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, args, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
// Deleting all the helper pod for http chaos
log.Info("[Cleanup]: Deleting all the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pods, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, podName, nodeName, runID, args, labelSuffix string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID, args string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateHTTPChaosHelperPod")
defer span.End()
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
@ -265,7 +201,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name http-chaos",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, podName, args),
Env: getPodEnv(ctx, experimentsDetails, targets, args),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
@ -286,18 +222,23 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName, args string) []apiv1.EnvVar {
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets, args string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS).
SetEnv("APP_POD", podName).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
@ -310,13 +251,15 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName, a
SetEnv("TARGET_SERVICE_PORT", strconv.Itoa(experimentsDetails.TargetServicePort)).
SetEnv("PROXY_PORT", strconv.Itoa(experimentsDetails.ProxyPort)).
SetEnv("TOXICITY", strconv.Itoa(experimentsDetails.Toxicity)).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
}
//SetChaosTunables will setup a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value.
// SetChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)

View File

@ -1,18 +1,23 @@
package latency
import (
"context"
"strconv"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
//PodHttpLatencyChaos contains the steps to prepare and inject http latency chaos
func PodHttpLatencyChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PodHttpLatencyChaos contains the steps to prepare and inject http latency chaos
func PodHttpLatencyChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHttpLatencyFault")
defer span.End()
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
@ -24,5 +29,5 @@ func PodHttpLatencyChaos(experimentsDetails *experimentTypes.ExperimentDetails,
})
args := "-t latency -a latency=" + strconv.Itoa(experimentsDetails.Latency)
return http_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,20 +1,25 @@
package modifybody
import (
"context"
"fmt"
"math"
"strings"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
// PodHttpModifyBodyChaos contains the steps to prepare and inject http modify body chaos
func PodHttpModifyBodyChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PodHttpModifyBodyChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHTTPModifyBodyFault")
defer span.End()
// responseBodyMaxLength defines the max length of response body string to be printed. It is taken as
// the min of length of body and 120 characters to avoid printing large response body.
@ -34,7 +39,7 @@ func PodHttpModifyBodyChaos(experimentsDetails *experimentTypes.ExperimentDetail
args := fmt.Sprintf(
`-t modify_body -a body="%v" -a content_type=%v -a content_encoding=%v`,
EscapeQuotes(experimentsDetails.ResponseBody), experimentsDetails.ContentType, experimentsDetails.ContentEncoding)
return http_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// EscapeQuotes escapes the quotes in the given string

View File

@ -1,18 +1,23 @@
package reset
import (
"context"
"strconv"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
//PodHttpResetPeerChaos contains the steps to prepare and inject http reset peer chaos
func PodHttpResetPeerChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PodHttpResetPeerChaos contains the steps to prepare and inject http reset peer chaos
func PodHttpResetPeerChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHTTPResetPeerFault")
defer span.End()
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Target Port": experimentsDetails.TargetServicePort,
@ -24,5 +29,5 @@ func PodHttpResetPeerChaos(experimentsDetails *experimentTypes.ExperimentDetails
})
args := "-t reset_peer -a timeout=" + strconv.Itoa(experimentsDetails.ResetTimeout)
return http_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,6 +1,7 @@
package statuscode
import (
"context"
"fmt"
"math"
"math/rand"
@ -8,13 +9,16 @@ import (
"strings"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"go.opentelemetry.io/otel"
http_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib"
body "github.com/litmuschaos/litmus-go/chaoslib/litmus/http-chaos/lib/modify-body"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
)
@ -26,7 +30,9 @@ var acceptedStatusCodes = []string{
}
// PodHttpStatusCodeChaos contains the steps to prepare and inject http status code chaos
func PodHttpStatusCodeChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PodHttpStatusCodeChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodHttpStatusCodeFault")
defer span.End()
// responseBodyMaxLength defines the max length of response body string to be printed. It is taken as
// the min of length of body and 120 characters to avoid printing large response body.
@ -49,7 +55,7 @@ func PodHttpStatusCodeChaos(experimentsDetails *experimentTypes.ExperimentDetail
`-t status_code -a status_code=%s -a modify_response_body=%d -a response_body="%v" -a content_type=%s -a content_encoding=%s`,
experimentsDetails.StatusCode, stringBoolToInt(experimentsDetails.ModifyResponseBody), body.EscapeQuotes(experimentsDetails.ResponseBody),
experimentsDetails.ContentType, experimentsDetails.ContentEncoding)
return http_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
return http_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// GetStatusCode performs two functions:
@ -71,11 +77,11 @@ func GetStatusCode(statusCode string) (string, error) {
} else {
acceptedCodes := getAcceptedCodesInList(statusCodeList, acceptedStatusCodes)
if len(acceptedCodes) == 0 {
return "", errors.Errorf("invalid status code provided, code: %s", statusCode)
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("invalid status code: %s", statusCode)}
}
return acceptedCodes[rand.Intn(len(acceptedCodes))], nil
}
return "", errors.Errorf("status code %s is not supported. \nList of supported status codes: %v", statusCode, acceptedStatusCodes)
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("status code '%s' is not supported. Supported status codes are: %v", statusCode, acceptedStatusCodes)}
}
// getAcceptedCodesInList returns the list of accepted status codes from a list of status codes

View File

@ -0,0 +1,165 @@
package lib
import (
"context"
"fmt"
"os"
"strconv"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/load/k6-loadgen/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
corev1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectK6LoadGenFault")
defer span.End()
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
runID := stringutils.GetRunID()
// creating the helper pod to perform k6-loadgen chaos
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// PrepareChaos contains the preparation steps before chaos injection
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareK6LoadGenFault")
defer span.End()
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Starting the k6-loadgen experiment
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not execute chaos")
}
// Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateK6LoadGenFaultHelperPod")
defer span.End()
const volumeName = "script-volume"
const mountPath = "/mnt"
var envs []corev1.EnvVar
args := []string{
mountPath + "/" + experimentsDetails.ScriptSecretKey,
"-q",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--tag",
"trace_id=" + span.SpanContext().TraceID().String(),
}
if otelExporterEndpoint := os.Getenv(telemetry.OTELExporterOTLPEndpoint); otelExporterEndpoint != "" {
envs = []corev1.EnvVar{
{
Name: "K6_OTEL_METRIC_PREFIX",
Value: experimentsDetails.OTELMetricPrefix,
},
{
Name: "K6_OTEL_GRPC_EXPORTER_INSECURE",
Value: "true",
},
{
Name: "K6_OTEL_GRPC_EXPORTER_ENDPOINT",
Value: otelExporterEndpoint,
},
}
args = append(args, "--out", "experimental-opentelemetry")
}
helperPod := &corev1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: corev1.PodSpec{
RestartPolicy: corev1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
Containers: []corev1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: corev1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"k6",
"run",
},
Args: args,
Env: envs,
Resources: chaosDetails.Resources,
VolumeMounts: []corev1.VolumeMount{
{
Name: volumeName,
MountPath: mountPath,
},
},
},
},
Volumes: []corev1.Volume{
{
Name: volumeName,
VolumeSource: corev1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
SecretName: experimentsDetails.ScriptSecretName,
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}

View File

@ -2,26 +2,33 @@ package lib
import (
"context"
"fmt"
"strconv"
"strings"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/workloads"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kafka/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/annotation"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareKafkaPodDeleteFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.ChaoslibDetail.RampTime != 0 {
@ -31,15 +38,15 @@ func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, cli
switch strings.ToLower(experimentsDetails.ChaoslibDetail.Sequence) {
case "serial":
if err := injectChaosInSerialMode(experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return err
if err := injectChaosInSerialMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return err
if err := injectChaosInParallelMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.ChaoslibDetail.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.ChaoslibDetail.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -51,11 +58,12 @@ func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, cli
}
// injectChaosInSerialMode delete the kafka broker pods in serial mode(one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectKafkaPodDeleteFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -68,9 +76,10 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
for duration < experimentsDetails.ChaoslibDetail.ChaosDuration {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.KafkaBroker == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or KAFKA_BROKER")
if experimentsDetails.KafkaBroker == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or KAFKA_BROKER"}
}
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.ChaoslibDetail.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.KafkaBroker, podsAffectedPerc, clients, chaosDetails)
if err != nil {
@ -78,17 +87,15 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// deriving the parent name of the target resources
if chaosDetails.AppDetail.Kind != "" {
for _, pod := range targetPodList.Items {
parentName, err := annotation.GetParentName(clients, pod, chaosDetails)
if err != nil {
return err
}
common.SetParentName(parentName, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target, "targeted", chaosDetails.AppDetail.Kind, chaosDetails)
for _, pod := range targetPodList.Items {
kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return err
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
}
if experimentsDetails.ChaoslibDetail.EngineName != "" {
@ -104,18 +111,18 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"PodName": pod.Name})
if experimentsDetails.ChaoslibDetail.Force {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaoslibDetail.AppNS).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaoslibDetail.AppNS).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
}
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
}
switch chaosDetails.Randomness {
case true:
if err := common.RandomInterval(experimentsDetails.ChaoslibDetail.ChaosInterval); err != nil {
return err
return stacktrace.Propagate(err, "could not get random chaos interval")
}
default:
//Waiting for the chaos interval after chaos injection
@ -128,8 +135,15 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.ChaoslibDetail.AppNS, experimentsDetails.ChaoslibDetail.AppLabel, experimentsDetails.ChaoslibDetail.Timeout, experimentsDetails.ChaoslibDetail.Delay, clients); err != nil {
return err
for _, parent := range chaosDetails.ParentsResources {
target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.ChaoslibDetail.Timeout, experimentsDetails.ChaoslibDetail.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
@ -140,11 +154,12 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode delete the kafka broker pods in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectKafkaPodDeleteFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -157,27 +172,25 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for duration < experimentsDetails.ChaoslibDetail.ChaosDuration {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.KafkaBroker == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or KAFKA_BROKER")
if experimentsDetails.KafkaBroker == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or KAFKA_BROKER"}
}
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.ChaoslibDetail.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.KafkaBroker, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
// deriving the parent name of the target resources
if chaosDetails.AppDetail.Kind != "" {
for _, pod := range targetPodList.Items {
parentName, err := annotation.GetParentName(clients, pod, chaosDetails)
if err != nil {
return err
}
common.SetParentName(parentName, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target, "targeted", chaosDetails.AppDetail.Kind, chaosDetails)
for _, pod := range targetPodList.Items {
kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return stacktrace.Propagate(err, "could not get pod owner name and kind")
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
}
if experimentsDetails.ChaoslibDetail.EngineName != "" {
@ -193,19 +206,19 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"PodName": pod.Name})
if experimentsDetails.ChaoslibDetail.Force {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaoslibDetail.AppNS).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaoslibDetail.AppNS).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
}
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
}
}
switch chaosDetails.Randomness {
case true:
if err := common.RandomInterval(experimentsDetails.ChaoslibDetail.ChaosInterval); err != nil {
return err
return stacktrace.Propagate(err, "could not get random chaos interval")
}
default:
//Waiting for the chaos interval after chaos injection
@ -218,8 +231,15 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.ChaoslibDetail.AppNS, experimentsDetails.ChaoslibDetail.AppLabel, experimentsDetails.ChaoslibDetail.Timeout, experimentsDetails.ChaoslibDetail.Delay, clients); err != nil {
return err
for _, parent := range chaosDetails.ParentsResources {
target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.ChaoslibDetail.Timeout, experimentsDetails.ChaoslibDetail.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())

View File

@ -2,31 +2,38 @@ package lib
import (
"context"
"fmt"
"strconv"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/kubelet-service-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareKubeletKill contains prepration steps before chaos injection
func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareKubeletKill(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareKubeletServiceKillFault")
defer span.End()
var err error
if experimentsDetails.TargetNode == "" {
//Select node for kubelet-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node name")
}
}
@ -34,7 +41,7 @@ func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, c
"NodeName": experimentsDetails.TargetNode,
})
experimentsDetails.RunID = common.GetRunID()
experimentsDetails.RunID = stringutils.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -50,54 +57,33 @@ func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, c
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not set helper data")
}
}
// Creating the helper pod to perform node memory hog
if err = createHelperPod(experimentsDetails, clients, chaosDetails, experimentsDetails.TargetNode); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err = createHelperPod(ctx, experimentsDetails, clients, chaosDetails, experimentsDetails.TargetNode); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err = status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(experimentsDetails.TargetNode, "targeted", "node", chaosDetails)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return err
}
if err := common.CheckHelperStatusAndRunProbes(ctx, appLabel, experimentsDetails.TargetNode, chaosDetails, clients, resultDetails, eventsDetails); err != nil {
return err
}
// Checking for the node to be in not-ready state
log.Info("[Status]: Check for the node to be in NotReady state")
if err = status.CheckNodeNotReadyState(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("application node is not in NotReady state, err: %v", err)
if deleteErr := common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients); deleteErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[err: %v, delete error: %v]", err, deleteErr)}
}
return stacktrace.Propagate(err, "could not check for NOT READY state")
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.WaitForCompletionAndDeleteHelperPods(appLabel, chaosDetails, clients, false); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
@ -105,11 +91,14 @@ func PrepareKubeletKill(experimentsDetails *experimentTypes.ExperimentDetails, c
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateKubeletServiceKillFaultHelperPod")
defer span.End()
privileged := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
@ -118,7 +107,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, "", experimentsDetails.ExperimentName),
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
@ -190,8 +179,16 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
func ptrint64(p int64) *int64 {

View File

@ -1,6 +1,7 @@
package helper
import (
"context"
"fmt"
"os"
"os/exec"
@ -10,8 +11,13 @@ import (
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
@ -26,14 +32,15 @@ const (
)
var (
err error
inject, abort chan os.Signal
err error
inject, abort chan os.Signal
sPorts, dPorts, whitelistDPorts, whitelistSPorts []string
)
var destIps, sPorts, dPorts []string
// Helper injects the network chaos
func Helper(clients clients.ClientSets) {
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodNetworkFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
@ -54,10 +61,11 @@ func Helper(clients clients.ClientSets) {
log.Info("[PreReq]: Getting the ENV variables")
getENV(&experimentsDetails)
// Intialise the chaos attributes
// Initialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters
// Initialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
// Set the chaos result uid
@ -65,213 +73,304 @@ func Helper(clients clients.ClientSets) {
err := preparePodNetworkChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails)
if err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
//preparePodNetworkChaos contains the prepration steps before chaos injection
// preparePodNetworkChaos contains the prepration steps before chaos injection
func preparePodNetworkChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
containerID, err := common.GetRuntimeBasedContainerID(experimentsDetails.ContainerRuntime, experimentsDetails.SocketPath, experimentsDetails.TargetPods, experimentsDetails.AppNS, experimentsDetails.TargetContainer, clients)
if err != nil {
return err
}
// extract out the pid of the target container
targetPID, err := common.GetPauseAndSandboxPID(experimentsDetails.ContainerRuntime, containerID, experimentsDetails.SocketPath)
if err != nil {
return err
targetEnv := os.Getenv("TARGETS")
if targetEnv == "" {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: "no target found, provide atleast one target"}
}
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
var targets []targetDetails
for _, t := range strings.Split(targetEnv, ";") {
target := strings.Split(t, ":")
if len(target) != 4 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: fmt.Sprintf("unsupported target format: '%v'", targets)}
}
td := targetDetails{
Name: target[0],
Namespace: target[1],
TargetContainer: target[2],
DestinationIps: getDestIps(target[3]),
Source: chaosDetails.ChaosPodName,
}
td.ContainerId, err = common.GetRuntimeBasedContainerID(experimentsDetails.ContainerRuntime, experimentsDetails.SocketPath, td.Name, td.Namespace, td.TargetContainer, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the network ns path of the pod sandbox or pause container
td.NetworkNsPath, err = common.GetNetworkNsPath(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container network ns path")
}
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos
go abortWatcher(targetPID, experimentsDetails.NetworkInterface, resultDetails.Name, chaosDetails.ChaosNamespace, experimentsDetails.TargetPods)
// injecting network chaos inside target container
if err = injectChaos(experimentsDetails, targetPID); err != nil {
return err
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Stopping the experiment")
// cleaning the netem process after chaos injection
if err = killnetem(targetPID, experimentsDetails.NetworkInterface); err != nil {
return err
}
return result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods)
}
// injectChaos inject the network chaos in target container
// it is using nsenter command to enter into network namespace of target container
// and execute the netem command inside it.
func injectChaos(experimentDetails *experimentTypes.ExperimentDetails, pid int) error {
netemCommands := os.Getenv("NETEM_COMMAND")
go abortWatcher(targets, experimentsDetails.NetworkInterface, resultDetails.Name, chaosDetails.ChaosNamespace)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
if len(destIps) == 0 && len(sPorts) == 0 && len(dPorts) == 0 {
tc := fmt.Sprintf("sudo nsenter -t %d -n tc qdisc replace dev %s root netem %v", pid, experimentDetails.NetworkInterface, netemCommands)
cmd := exec.Command("/bin/bash", "-c", tc)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
}
for index, t := range targets {
// injecting network chaos inside target container
if err = injectChaos(experimentsDetails.NetworkInterface, t); err != nil {
if revertErr := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, index-1); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not inject chaos")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, index); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
if experimentsDetails.EngineName != "" {
msg := "Injected " + experimentsDetails.ExperimentName + " chaos on application pods"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: Duration is over, reverting chaos")
if err := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
return nil
}
func revertChaosForAllTargets(targets []targetDetails, networkInterface string, resultDetails *types.ResultDetails, chaosNs string, index int) error {
var errList []string
for i := 0; i <= index; i++ {
killed, err := killnetem(targets[i], networkInterface)
if !killed && err != nil {
errList = append(errList, err.Error())
continue
}
if killed && err == nil {
if err = result.AnnotateChaosResult(resultDetails.Name, chaosNs, "reverted", "pod", targets[i].Name); err != nil {
errList = append(errList, err.Error())
}
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// injectChaos inject the network chaos in target container
// it is using nsenter command to enter into network namespace of target container
// and execute the netem command inside it.
func injectChaos(netInterface string, target targetDetails) error {
netemCommands := os.Getenv("NETEM_COMMAND")
if len(target.DestinationIps) == 0 && len(sPorts) == 0 && len(dPorts) == 0 && len(whitelistDPorts) == 0 && len(whitelistSPorts) == 0 {
tc := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %s root %v", target.NetworkNsPath, netInterface, netemCommands)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create tc rules", target.Source); err != nil {
return err
}
} else {
// Create a priority-based queue
// This instantly creates classes 1:1, 1:2, 1:3
priority := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %v root handle 1: prio", target.NetworkNsPath, netInterface)
log.Info(priority)
if err := common.RunBashCommand(priority, "failed to create priority-based queue", target.Source); err != nil {
return err
}
// Add queueing discipline for 1:3 class.
// No traffic is going through 1:3 yet
traffic := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %v parent 1:3 %v", target.NetworkNsPath, netInterface, netemCommands)
log.Info(traffic)
if err := common.RunBashCommand(traffic, "failed to create netem queueing discipline", target.Source); err != nil {
return err
}
if len(whitelistDPorts) != 0 || len(whitelistSPorts) != 0 {
for _, port := range whitelistDPorts {
//redirect traffic to specific dport through band 2
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 2 u32 match ip dport %v 0xffff flowid 1:2", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create whitelist dport match filters", target.Source); err != nil {
return err
}
}
for _, port := range whitelistSPorts {
//redirect traffic to specific sport through band 2
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 2 u32 match ip sport %v 0xffff flowid 1:2", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create whitelist sport match filters", target.Source); err != nil {
return err
}
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dst 0.0.0.0/0 flowid 1:3", target.NetworkNsPath, netInterface)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create rule for all ports match filters", target.Source); err != nil {
return err
}
} else {
// Create a priority-based queue
// This instantly creates classes 1:1, 1:2, 1:3
priority := fmt.Sprintf("sudo nsenter -t %v -n tc qdisc replace dev %v root handle 1: prio", pid, experimentDetails.NetworkInterface)
cmd := exec.Command("/bin/bash", "-c", priority)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
// Add queueing discipline for 1:3 class.
// No traffic is going through 1:3 yet
traffic := fmt.Sprintf("sudo nsenter -t %v -n tc qdisc replace dev %v parent 1:3 netem %v", pid, experimentDetails.NetworkInterface, netemCommands)
cmd = exec.Command("/bin/bash", "-c", traffic)
out, err = cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
return err
}
for _, ip := range destIps {
// redirect traffic to specific IP through band 3
tc := fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dst %v flowid 1:3", pid, experimentDetails.NetworkInterface, ip)
if strings.Contains(ip, ":") {
tc = fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip6 dst %v flowid 1:3", pid, experimentDetails.NetworkInterface, ip)
for i := range target.DestinationIps {
var (
ip = target.DestinationIps[i]
ports []string
isIPV6 = strings.Contains(target.DestinationIps[i], ":")
)
// extracting the destination ports from the ips
// ip format is ip(|port1|port2....|portx)
if strings.Contains(target.DestinationIps[i], "|") {
ip = strings.Split(target.DestinationIps[i], "|")[0]
ports = strings.Split(target.DestinationIps[i], "|")[1:]
}
cmd = exec.Command("/bin/bash", "-c", tc)
out, err = cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
// redirect traffic to specific IP through band 3
filter := fmt.Sprintf("match ip dst %v", ip)
if isIPV6 {
filter = fmt.Sprintf("match ip6 dst %v", ip)
}
if len(ports) != 0 {
for _, port := range ports {
portFilter := fmt.Sprintf("%s match ip dport %v 0xffff", filter, port)
if isIPV6 {
portFilter = fmt.Sprintf("%s match ip6 dport %v 0xffff", filter, port)
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 %s flowid 1:3", target.NetworkNsPath, netInterface, portFilter)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ips match filters", target.Source); err != nil {
return err
}
}
continue
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 %s flowid 1:3", target.NetworkNsPath, netInterface, filter)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ips match filters", target.Source); err != nil {
return err
}
}
for _, port := range sPorts {
//redirect traffic to specific sport through band 3
tc := fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip sport %v 0xffff flowid 1:3", pid, experimentDetails.NetworkInterface, port)
cmd = exec.Command("/bin/bash", "-c", tc)
out, err = cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip sport %v 0xffff flowid 1:3", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create source ports match filters", target.Source); err != nil {
return err
}
}
for _, port := range dPorts {
//redirect traffic to specific dport through band 3
tc := fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dport %v 0xffff flowid 1:3", pid, experimentDetails.NetworkInterface, port)
cmd = exec.Command("/bin/bash", "-c", tc)
out, err = cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dport %v 0xffff flowid 1:3", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ports match filters", target.Source); err != nil {
return err
}
}
}
}
log.Infof("chaos injected successfully on {pod: %v, container: %v}", target.Name, target.TargetContainer)
return nil
}
// killnetem kill the netem process for all the target containers
func killnetem(PID int, networkInterface string) error {
tc := fmt.Sprintf("sudo nsenter -t %d -n tc qdisc delete dev %s root", PID, networkInterface)
func killnetem(target targetDetails, networkInterface string) (bool, error) {
tc := fmt.Sprintf("sudo nsenter --net=%s tc qdisc delete dev %s root", target.NetworkNsPath, networkInterface)
cmd := exec.Command("/bin/bash", "-c", tc)
out, err := cmd.CombinedOutput()
log.Info(cmd.String())
if err != nil {
log.Error(string(out))
log.Info(cmd.String())
// ignoring err if qdisc process doesn't exist inside the target container
if strings.Contains(string(out), qdiscNotFound) || strings.Contains(string(out), qdiscNoFileFound) {
log.Warn("The network chaos process has already been removed")
return nil
return true, err
}
return err
log.Error(err.Error())
return false, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: target.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", target.Name, target.Namespace, target.TargetContainer), Reason: fmt.Sprintf("failed to revert network faults: %s", string(out))}
}
return nil
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", target.Name, target.Namespace, target.TargetContainer)
return true, nil
}
//getENV fetches all the env variables from the runner pod
type targetDetails struct {
Name string
Namespace string
ServiceMesh string
DestinationIps []string
TargetContainer string
ContainerId string
Source string
NetworkNsPath string
}
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.AppLabel = types.Getenv("APP_LABEL", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", ""))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.ContainerRuntime = types.Getenv("CONTAINER_RUNTIME", "")
experimentDetails.NetworkInterface = types.Getenv("NETWORK_INTERFACE", "eth0")
experimentDetails.NetworkInterface = types.Getenv("NETWORK_INTERFACE", "")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
experimentDetails.DestinationIPs = types.Getenv("DESTINATION_IPS", "")
experimentDetails.SourcePorts = types.Getenv("SOURCE_PORTS", "")
experimentDetails.DestinationPorts = types.Getenv("DESTINATION_PORTS", "")
destIps = getDestinationIPs(experimentDetails.DestinationIPs)
if strings.TrimSpace(experimentDetails.DestinationPorts) != "" {
dPorts = strings.Split(strings.TrimSpace(experimentDetails.DestinationPorts), ",")
}
if strings.TrimSpace(experimentDetails.SourcePorts) != "" {
sPorts = strings.Split(strings.TrimSpace(experimentDetails.SourcePorts), ",")
}
}
func getDestinationIPs(ips string) []string {
if strings.TrimSpace(ips) == "" {
return nil
}
destIPs := strings.Split(strings.TrimSpace(ips), ",")
var uniqueIps []string
// removing duplicates ips from the list, if any
for i := range destIPs {
if !common.Contains(destIPs[i], uniqueIps) {
uniqueIps = append(uniqueIps, destIPs[i])
if strings.Contains(experimentDetails.DestinationPorts, "!") {
whitelistDPorts = strings.Split(strings.TrimPrefix(strings.TrimSpace(experimentDetails.DestinationPorts), "!"), ",")
} else {
dPorts = strings.Split(strings.TrimSpace(experimentDetails.DestinationPorts), ",")
}
}
if strings.TrimSpace(experimentDetails.SourcePorts) != "" {
if strings.Contains(experimentDetails.SourcePorts, "!") {
whitelistSPorts = strings.Split(strings.TrimPrefix(strings.TrimSpace(experimentDetails.SourcePorts), "!"), ",")
} else {
sPorts = strings.Split(strings.TrimSpace(experimentDetails.SourcePorts), ",")
}
}
return uniqueIps
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targetPID int, networkInterface, resultName, chaosNS, targetPodName string) {
func abortWatcher(targets []targetDetails, networkInterface, resultName, chaosNS string) {
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
@ -279,15 +378,46 @@ func abortWatcher(targetPID int, networkInterface, resultName, chaosNS, targetPo
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
if err = killnetem(targetPID, networkInterface); err != nil {
log.Errorf("unable to kill netem process, err :%v", err)
for _, t := range targets {
killed, err := killnetem(t, networkInterface)
if err != nil && !killed {
log.Errorf("unable to kill netem process, err :%v", err)
continue
}
if killed && err == nil {
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
}
}
retry--
time.Sleep(1 * time.Second)
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", targetPodName); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
log.Info("Chaos Revert Completed")
os.Exit(1)
}
func getDestIps(serviceMesh string) []string {
var (
destIps = os.Getenv("DESTINATION_IPS")
uniqueIps []string
)
if serviceMesh == "true" {
destIps = os.Getenv("DESTINATION_IPS_SERVICE_MESH")
}
if strings.TrimSpace(destIps) == "" {
return nil
}
ips := strings.Split(strings.TrimSpace(destIps), ",")
// removing duplicates ips from the list, if any
for i := range ips {
if !common.Contains(ips[i], uniqueIps) {
uniqueIps = append(uniqueIps, ips[i])
}
}
return uniqueIps
}

View File

@ -1,15 +1,26 @@
package corruption
import (
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
//PodNetworkCorruptionChaos contains the steps to prepare and inject chaos
func PodNetworkCorruptionChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PodNetworkCorruptionChaos contains the steps to prepare and inject chaos
func PodNetworkCorruptionChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkCorruptionFault")
defer span.End()
args := "corrupt " + experimentsDetails.NetworkPacketCorruptionPercentage
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
args := "netem corrupt " + experimentsDetails.NetworkPacketCorruptionPercentage
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,15 +1,26 @@
package duplication
import (
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
//PodNetworkDuplicationChaos contains the steps to prepare and inject chaos
func PodNetworkDuplicationChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PodNetworkDuplicationChaos contains the steps to prepare and inject chaos
func PodNetworkDuplicationChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkDuplicationFault")
defer span.End()
args := "duplicate " + experimentsDetails.NetworkPacketDuplicationPercentage
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
args := "netem duplicate " + experimentsDetails.NetworkPacketDuplicationPercentage
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,17 +1,27 @@
package latency
import (
"context"
"fmt"
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
//PodNetworkLatencyChaos contains the steps to prepare and inject chaos
func PodNetworkLatencyChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PodNetworkLatencyChaos contains the steps to prepare and inject chaos
func PodNetworkLatencyChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkLatencyFault")
defer span.End()
args := "delay " + strconv.Itoa(experimentsDetails.NetworkLatency) + "ms " + strconv.Itoa(experimentsDetails.Jitter) + "ms"
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
args := "netem delay " + strconv.Itoa(experimentsDetails.NetworkLatency) + "ms " + strconv.Itoa(experimentsDetails.Jitter) + "ms"
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -1,15 +1,26 @@
package loss
import (
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
//PodNetworkLossChaos contains the steps to prepare and inject chaos
func PodNetworkLossChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PodNetworkLossChaos contains the steps to prepare and inject chaos
func PodNetworkLossChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkLossFault")
defer span.End()
args := "loss " + experimentsDetails.NetworkPacketLossPercentage
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
args := "netem loss " + experimentsDetails.NetworkPacketLossPercentage
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -3,95 +3,50 @@ package lib
import (
"context"
"fmt"
k8serrors "k8s.io/apimachinery/pkg/api/errors"
"net"
"os"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
k8serrors "k8s.io/apimachinery/pkg/api/errors"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var serviceMesh = []string{"istio", "envoy"}
var destIpsSvcMesh string
var destIps string
//PrepareAndInjectChaos contains the prepration & injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error {
// PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args string) error {
targetPodList := apiv1.PodList{}
var err error
var podsAffectedPerc int
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
//setup the tunables if provided in range
//set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
logExperimentFields(experimentsDetails)
switch experimentsDetails.NetworkChaosType {
case "network-loss":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketLossPercentage": experimentsDetails.NetworkPacketLossPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "network-latency":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkLatency": strconv.Itoa(experimentsDetails.NetworkLatency),
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "network-corruption":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketCorruptionPercentage": experimentsDetails.NetworkPacketCorruptionPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "network-duplication":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketDuplicationPercentage": experimentsDetails.NetworkPacketDuplicationPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
podsAffectedPerc, _ = strconv.Atoi(experimentsDetails.PodsAffectedPerc)
if experimentsDetails.NodeLabel == "" {
//targetPodList, err := common.GetPodListFromSpecifiedNodes(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
} else {
//targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if experimentsDetails.TargetPods == "" {
targetPodList, err = common.GetPodListFromSpecifiedNodes(experimentsDetails.TargetPods, podsAffectedPerc, experimentsDetails.NodeLabel, clients, chaosDetails)
if err != nil {
return err
}
} else {
log.Infof("TARGET_PODS env is provided, overriding the NODE_LABEL input")
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
}
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -103,40 +58,41 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err)
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "")
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode inject the network chaos in all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodNetworkFaultInSerialMode")
defer span.End()
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -144,51 +100,27 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
destIPs, err := GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients, isServiceMeshEnabledForPod(pod))
serviceMesh, err := setDestIps(pod, experimentsDetails, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not set destination ips")
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, args, labelSuffix, destIPs); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer, serviceMesh), pod.Spec.NodeName, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting the the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pods, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
@ -196,89 +128,68 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode inject the network chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodNetworkFaultInParallelMode")
defer span.End()
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
targets, err := filterPodsForNodes(targetPodList, experimentsDetails, clients)
if err != nil {
return stacktrace.Propagate(err, "could not filter target pods")
}
destIPs, err := GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients, isServiceMeshEnabledForPod(pod))
if err != nil {
return err
runID := stringutils.GetRunID()
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s:%s", k.Name, k.Namespace, k.TargetContainer, k.ServiceMesh))
}
//Get the target container name of the application pod
//It checks the empty target container for the first iteration only
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, args, labelSuffix, destIPs); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID, args); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pods, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, podName, nodeName, runID, args, labelSuffix, destIPs string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets string, nodeName, runID, args string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreatePodNetworkFaultHelperPod")
defer span.End()
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
var (
privilegedEnable = true
terminationGracePeriodSeconds = int64(experimentsDetails.TerminationGracePeriodSeconds)
helperName = fmt.Sprintf("%s-helper-%s", experimentsDetails.ExperimentName, stringutils.GetRunID())
)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Name: helperName,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
Tolerations: chaosDetails.Tolerations,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
@ -306,7 +217,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name network-chaos",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, podName, args, destIPs),
Env: getPodEnv(ctx, experimentsDetails, targets, args),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
@ -327,18 +238,40 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
// mount the network ns path for crio runtime
// it is required to access the sandbox network ns
if strings.ToLower(experimentsDetails.ContainerRuntime) == "crio" {
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, apiv1.Volume{
Name: "netns-path",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/var/run/netns",
},
},
})
helperPod.Spec.Containers[0].VolumeMounts = append(helperPod.Spec.Containers[0].VolumeMounts, apiv1.VolumeMount{
Name: "netns-path",
MountPath: "/var/run/netns",
})
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName, args, destIPs string) []apiv1.EnvVar {
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string, args string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS).
SetEnv("APP_POD", podName).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
@ -348,23 +281,37 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName, a
SetEnv("NETWORK_INTERFACE", experimentsDetails.NetworkInterface).
SetEnv("EXPERIMENT_NAME", experimentsDetails.ExperimentName).
SetEnv("SOCKET_PATH", experimentsDetails.SocketPath).
SetEnv("DESTINATION_IPS", destIPs).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("DESTINATION_IPS", destIps).
SetEnv("DESTINATION_IPS_SERVICE_MESH", destIpsSvcMesh).
SetEnv("SOURCE_PORTS", experimentsDetails.SourcePorts).
SetEnv("DESTINATION_PORTS", experimentsDetails.DestinationPorts).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
}
type targetsDetails struct {
Target []target
}
type target struct {
Namespace string
Name string
TargetContainer string
ServiceMesh string
}
// GetTargetIps return the comma separated target ips
// It fetch the ips from the target ips (if defined by users)
// it append the ips from the host, if target host is provided
// It fetches the ips from the target ips (if defined by users)
// it appends the ips from the host, if target host is provided
func GetTargetIps(targetIPs, targetHosts string, clients clients.ClientSets, serviceMesh bool) (string, error) {
ipsFromHost, err := getIpsForTargetHosts(targetHosts, clients, serviceMesh)
if err != nil {
return "", err
return "", stacktrace.Propagate(err, "could not get ips from target hosts")
}
if targetIPs == "" {
targetIPs = ipsFromHost
@ -374,31 +321,46 @@ func GetTargetIps(targetIPs, targetHosts string, clients clients.ClientSets, ser
return targetIPs, nil
}
// it derive the pod ips from the kubernetes service
// it derives the pod ips from the kubernetes service
func getPodIPFromService(host string, clients clients.ClientSets) ([]string, error) {
var ips []string
svcFields := strings.Split(host, ".")
if len(svcFields) != 5 {
return ips, fmt.Errorf("provide the valid FQDN for service in '<svc-name>.<namespace>.svc.cluster.local format, host: %v", host)
return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{host: %s}", host), Reason: "provide the valid FQDN for service in '<svc-name>.<namespace>.svc.cluster.local format"}
}
svcName, svcNs := svcFields[0], svcFields[1]
svc, err := clients.KubeClient.CoreV1().Services(svcNs).Get(context.Background(), svcName, v1.GetOptions{})
svc, err := clients.GetService(svcNs, svcName)
if err != nil {
if k8serrors.IsForbidden(err) {
log.Warnf("forbidden - failed to get %v service in %v namespace, err: %v", svcName, svcNs, err)
return ips, nil
}
return ips, err
return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{serviceName: %s, namespace: %s}", svcName, svcNs), Reason: err.Error()}
}
if svc.Spec.Selector == nil {
return nil, nil
}
var svcSelector string
for k, v := range svc.Spec.Selector {
pods, err := clients.KubeClient.CoreV1().Pods(svcNs).List(context.Background(), v1.ListOptions{LabelSelector: fmt.Sprintf("%s=%s", k, v)})
if err != nil {
return ips, err
}
for _, p := range pods.Items {
ips = append(ips, p.Status.PodIP)
if svcSelector == "" {
svcSelector += fmt.Sprintf("%s=%s", k, v)
continue
}
svcSelector += fmt.Sprintf(",%s=%s", k, v)
}
pods, err := clients.ListPods(svcNs, svcSelector)
if err != nil {
return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{svcName: %s,podLabel: %s, namespace: %s}", svcNs, svcSelector, svcNs), Reason: fmt.Sprintf("failed to derive pods from service: %s", err.Error())}
}
for _, p := range pods.Items {
if p.Status.PodIP == "" {
continue
}
ips = append(ips, p.Status.PodIP)
}
return ips, nil
}
@ -412,27 +374,49 @@ func getIpsForTargetHosts(targetHosts string, clients clients.ClientSets, servic
var commaSeparatedIPs []string
for i := range hosts {
hosts[i] = strings.TrimSpace(hosts[i])
if strings.Contains(hosts[i], "svc.cluster.local") && serviceMesh {
ips, err := getPodIPFromService(hosts[i], clients)
var (
hostName = hosts[i]
ports []string
)
if strings.Contains(hosts[i], "|") {
host := strings.Split(hosts[i], "|")
hostName = host[0]
ports = host[1:]
log.Infof("host and port: %v :%v", hostName, ports)
}
if strings.Contains(hostName, "svc.cluster.local") && serviceMesh {
ips, err := getPodIPFromService(hostName, clients)
if err != nil {
return "", err
return "", stacktrace.Propagate(err, "could not get pod ips from service")
}
log.Infof("Host: {%v}, IP address: {%v}", hosts[i], ips)
commaSeparatedIPs = append(commaSeparatedIPs, ips...)
if ports != nil {
for j := range ips {
commaSeparatedIPs = append(commaSeparatedIPs, ips[j]+"|"+strings.Join(ports, "|"))
}
} else {
commaSeparatedIPs = append(commaSeparatedIPs, ips...)
}
if finalHosts == "" {
finalHosts = hosts[i]
} else {
finalHosts = finalHosts + "," + hosts[i]
}
finalHosts = finalHosts + "," + hosts[i]
continue
}
ips, err := net.LookupIP(hosts[i])
ips, err := net.LookupIP(hostName)
if err != nil {
log.Warnf("Unknown host: {%v}, it won't be included in the scope of chaos", hosts[i])
log.Warnf("Unknown host: {%v}, it won't be included in the scope of chaos", hostName)
} else {
for j := range ips {
log.Infof("Host: {%v}, IP address: {%v}", hosts[i], ips[j])
log.Infof("Host: {%v}, IP address: {%v}", hostName, ips[j])
if ports != nil {
commaSeparatedIPs = append(commaSeparatedIPs, ips[j].String()+"|"+strings.Join(ports, "|"))
continue
}
commaSeparatedIPs = append(commaSeparatedIPs, ips[j].String())
}
if finalHosts == "" {
@ -443,14 +427,14 @@ func getIpsForTargetHosts(targetHosts string, clients clients.ClientSets, servic
}
}
if len(commaSeparatedIPs) == 0 {
return "", errors.Errorf("provided hosts: {%v} are invalid, unable to resolve", targetHosts)
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("hosts: %s", targetHosts), Reason: "provided hosts are invalid, unable to resolve"}
}
log.Infof("Injecting chaos on {%v} hosts", finalHosts)
return strings.Join(commaSeparatedIPs, ","), nil
}
//SetChaosTunables will setup a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value.
// SetChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.NetworkPacketLossPercentage = common.ValidateRange(experimentsDetails.NetworkPacketLossPercentage)
experimentsDetails.NetworkPacketCorruptionPercentage = common.ValidateRange(experimentsDetails.NetworkPacketCorruptionPercentage)
@ -462,9 +446,102 @@ func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
// It checks if pod contains service mesh sidecar
func isServiceMeshEnabledForPod(pod apiv1.Pod) bool {
for _, c := range pod.Spec.Containers {
if common.StringExistsInSlice(c.Name, serviceMesh) {
if common.SubStringExistsInSlice(c.Name, serviceMesh) {
return true
}
}
return false
}
func setDestIps(pod apiv1.Pod, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (string, error) {
var err error
if isServiceMeshEnabledForPod(pod) {
if destIpsSvcMesh == "" {
destIpsSvcMesh, err = GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients, true)
if err != nil {
return "false", err
}
}
return "true", nil
}
if destIps == "" {
destIps, err = GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients, false)
if err != nil {
return "false", err
}
}
return "false", nil
}
func filterPodsForNodes(targetPodList apiv1.PodList, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (map[string]*targetsDetails, error) {
targets := make(map[string]*targetsDetails)
targetContainer := experimentsDetails.TargetContainer
for _, pod := range targetPodList.Items {
serviceMesh, err := setDestIps(pod, experimentsDetails, clients)
if err != nil {
return targets, stacktrace.Propagate(err, "could not set destination ips")
}
if experimentsDetails.TargetContainer == "" {
targetContainer = pod.Spec.Containers[0].Name
}
td := target{
Name: pod.Name,
Namespace: pod.Namespace,
TargetContainer: targetContainer,
ServiceMesh: serviceMesh,
}
if targets[pod.Spec.NodeName] == nil {
targets[pod.Spec.NodeName] = &targetsDetails{
Target: []target{td},
}
} else {
targets[pod.Spec.NodeName].Target = append(targets[pod.Spec.NodeName].Target, td)
}
}
return targets, nil
}
func logExperimentFields(experimentsDetails *experimentTypes.ExperimentDetails) {
switch experimentsDetails.NetworkChaosType {
case "network-loss":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketLossPercentage": experimentsDetails.NetworkPacketLossPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-latency":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkLatency": strconv.Itoa(experimentsDetails.NetworkLatency),
"Jitter": experimentsDetails.Jitter,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-corruption":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketCorruptionPercentage": experimentsDetails.NetworkPacketCorruptionPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-duplication":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketDuplicationPercentage": experimentsDetails.NetworkPacketDuplicationPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-rate-limit":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkBandwidth": experimentsDetails.NetworkBandwidth,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
}
}

View File

@ -0,0 +1,29 @@
package rate
import (
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
// PodNetworkRateChaos contains the steps to prepare and inject chaos
func PodNetworkRateChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkRateLimit")
defer span.End()
args := fmt.Sprintf("tbf rate %s burst %s limit %s", experimentsDetails.NetworkBandwidth, experimentsDetails.Burst, experimentsDetails.Limit)
if experimentsDetails.PeakRate != "" {
args = fmt.Sprintf("%s peakrate %s", args, experimentsDetails.PeakRate)
}
if experimentsDetails.MinBurst != "" {
args = fmt.Sprintf("%s mtu %s", args, experimentsDetails.MinBurst)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -2,10 +2,16 @@ package lib
import (
"context"
"fmt"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-cpu-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -13,23 +19,25 @@ import (
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareNodeCPUHog contains prepration steps before chaos injection
func PrepareNodeCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareNodeCPUHog contains preparation steps before chaos injection
func PrepareNodeCPUHog(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeCPUHogFault")
defer span.End()
//setup the tunables if provided in range
//set up the tunables if provided in range
setChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"Node CPU Cores": experimentsDetails.NodeCPUcores,
"CPU Load": experimentsDetails.CPULoad,
"Node Affce Perc": experimentsDetails.NodesAffectedPerc,
"Sequence": experimentsDetails.Sequence,
"Node CPU Cores": experimentsDetails.NodeCPUcores,
"CPU Load": experimentsDetails.CPULoad,
"Node Affected Percentage": experimentsDetails.NodesAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
//Waiting for the ramp time before chaos injection
@ -42,7 +50,7 @@ func PrepareNodeCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, cl
nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc)
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node list")
}
log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{
@ -52,21 +60,21 @@ func PrepareNodeCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, cl
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not set helper data")
}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -78,14 +86,15 @@ func PrepareNodeCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, cl
}
// injectChaosInSerialMode stress the cpu of all the target nodes serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeCPUHogFaultInSerialMode")
defer span.End()
nodeCPUCores := experimentsDetails.NodeCPUcores
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -101,29 +110,29 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
// When number of cpu cores for hogging is not defined , it will take it from node capacity
if nodeCPUCores == "0" {
if err := setCPUCapacity(experimentsDetails, appNode, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not get node cpu capacity")
}
}
log.InfoWithValues("[Info]: Details of Node under chaos injection", logrus.Fields{
"NodeName": appNode,
"NodeCPUcores": experimentsDetails.NodeCPUcores,
"NodeCPUCores": experimentsDetails.NodeCPUcores,
})
experimentsDetails.RunID = common.GetRunID()
experimentsDetails.RunID = stringutils.GetRunID()
// Creating the helper pod to perform node cpu hog
if err := createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
common.SetTargets(appNode, "targeted", "node", chaosDetails)
@ -132,32 +141,35 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, false)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
}
}
return nil
}
// injectChaosInParallelMode stress the cpu of all the target nodes in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
nodeCPUCores := experimentsDetails.NodeCPUcores
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeCPUHogFaultInParallelMode")
defer span.End()
labelSuffix := common.GetRunID()
nodeCPUCores := experimentsDetails.NodeCPUcores
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
experimentsDetails.RunID = stringutils.GetRunID()
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
@ -169,7 +181,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
// When number of cpu cores for hogging is not defined , it will take it from node capacity
if nodeCPUCores == "0" {
if err := setCPUCapacity(experimentsDetails, appNode, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not get node cpu capacity")
}
}
@ -178,65 +190,44 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"NodeCPUcores": experimentsDetails.NodeCPUcores,
})
experimentsDetails.RunID = common.GetRunID()
// Creating the helper pod to perform node cpu hog
if err := createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, appNode := range targetNodeList {
common.SetTargets(appNode, "targeted", "node", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
//setCPUCapacity fetch the node cpu capacity
// setCPUCapacity fetch the node cpu capacity
func setCPUCapacity(experimentsDetails *experimentTypes.ExperimentDetails, appNode string, clients clients.ClientSets) error {
node, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), appNode, v1.GetOptions{})
node, err := clients.GetNode(appNode, experimentsDetails.Timeout, experimentsDetails.Delay)
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNode), Reason: err.Error()}
}
experimentsDetails.NodeCPUcores = node.Status.Capacity.Cpu().String()
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets, labelSuffix string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeCPUHogFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
@ -265,12 +256,20 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
//setChaosTunables will setup a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value.
// setChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.NodeCPUcores = common.ValidateRange(experimentsDetails.NodeCPUcores)
experimentsDetails.CPULoad = common.ValidateRange(experimentsDetails.CPULoad)

View File

@ -1,8 +1,8 @@
package lib
import (
"bytes"
"context"
"fmt"
"os"
"os/exec"
"os/signal"
@ -11,7 +11,12 @@ import (
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-drain/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -20,7 +25,6 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
apierrors "k8s.io/apimachinery/pkg/api/errors"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
@ -30,8 +34,10 @@ var (
inject, abort chan os.Signal
)
//PrepareNodeDrain contains the prepration steps before chaos injection
func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareNodeDrain contains the preparation steps before chaos injection
func PrepareNodeDrain(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeDrainFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -53,7 +59,7 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
//Select node for kubelet-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node name")
}
}
@ -65,7 +71,7 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -74,18 +80,22 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
go abortWatcher(experimentsDetails, clients, resultDetails, chaosDetails, eventsDetails)
// Drain the application node
if err := drainNode(experimentsDetails, clients, chaosDetails); err != nil {
return err
if err := drainNode(ctx, experimentsDetails, clients, chaosDetails); err != nil {
log.Info("[Revert]: Reverting chaos because error during draining of node")
if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(uncordonErr).Error())}
}
return stacktrace.Propagate(err, "could not drain node")
}
// Verify the status of AUT after reschedule
log.Info("[Status]: Verify the status of AUT after reschedule")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
if err = status.AUTStatusCheck(clients, chaosDetails); err != nil {
log.Info("[Revert]: Reverting chaos because application status check failed")
if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil {
log.Errorf("Unable to uncordon the node, err: %v", uncordonErr)
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(uncordonErr).Error())}
}
return errors.Errorf("application status check failed, err: %v", err)
return err
}
// Verify the status of Auxiliary Applications after reschedule
@ -94,9 +104,9 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
if err = status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
log.Info("[Revert]: Reverting chaos because auxiliary application status check failed")
if uncordonErr := uncordonNode(experimentsDetails, clients, chaosDetails); uncordonErr != nil {
log.Errorf("Unable to uncordon the node, err: %v", uncordonErr)
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(uncordonErr).Error())}
}
return errors.Errorf("auxiliary Applications status check failed, err: %v", err)
return err
}
}
@ -108,7 +118,7 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
// Uncordon the application node
if err := uncordonNode(experimentsDetails, clients, chaosDetails); err != nil {
return err
return stacktrace.Propagate(err, "could not uncordon the target node")
}
//Waiting for the ramp time after chaos injection
@ -119,8 +129,10 @@ func PrepareNodeDrain(experimentsDetails *experimentTypes.ExperimentDetails, cli
return nil
}
// drainNode drain the application node
func drainNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// drainNode drain the target node
func drainNode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeDrainFault")
defer span.End()
select {
case <-inject:
@ -130,12 +142,8 @@ func drainNode(experimentsDetails *experimentTypes.ExperimentDetails, clients cl
log.Infof("[Inject]: Draining the %v node", experimentsDetails.TargetNode)
command := exec.Command("kubectl", "drain", experimentsDetails.TargetNode, "--ignore-daemonsets", "--delete-emptydir-data", "--force", "--timeout", strconv.Itoa(experimentsDetails.ChaosDuration)+"s")
var out, stderr bytes.Buffer
command.Stdout = &out
command.Stderr = &stderr
if err := command.Run(); err != nil {
log.Infof("Error String: %v", stderr.String())
return errors.Errorf("Unable to drain the %v node, err: %v", experimentsDetails.TargetNode, err)
if err := common.RunCLICommands(command, "", fmt.Sprintf("{node: %s}", experimentsDetails.TargetNode), "failed to drain the target node", cerrors.ErrorTypeChaosInject); err != nil {
return err
}
common.SetTargets(experimentsDetails.TargetNode, "injected", "node", chaosDetails)
@ -146,10 +154,10 @@ func drainNode(experimentsDetails *experimentTypes.ExperimentDetails, clients cl
Try(func(attempt uint) error {
nodeSpec, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), experimentsDetails.TargetNode, v1.GetOptions{})
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{node: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
}
if !nodeSpec.Spec.Unschedulable {
return errors.Errorf("%v node is not in unschedulable state", experimentsDetails.TargetNode)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{node: %s}", experimentsDetails.TargetNode), Reason: "node is not in unschedule state"}
}
return nil
})
@ -164,25 +172,21 @@ func uncordonNode(experimentsDetails *experimentTypes.ExperimentDetails, clients
for _, targetNode := range targetNodes {
//Check node exist before uncordon the node
_, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), targetNode, v1.GetOptions{})
_, err := clients.GetNode(targetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil {
if apierrors.IsNotFound(err) {
log.Infof("[Info]: The %v node is no longer exist, skip uncordon the node", targetNode)
common.SetTargets(targetNode, "noLongerExist", "node", chaosDetails)
continue
} else {
return errors.Errorf("unable to get the %v node, err: %v", targetNode, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{node: %s}", targetNode), Reason: err.Error()}
}
}
log.Infof("[Recover]: Uncordon the %v node", targetNode)
command := exec.Command("kubectl", "uncordon", targetNode)
var out, stderr bytes.Buffer
command.Stdout = &out
command.Stderr = &stderr
if err := command.Run(); err != nil {
log.Infof("Error String: %v", stderr.String())
return errors.Errorf("unable to uncordon the %v node, err: %v", targetNode, err)
if err := common.RunCLICommands(command, "", fmt.Sprintf("{node: %s}", targetNode), "failed to uncordon the target node", cerrors.ErrorTypeChaosInject); err != nil {
return err
}
common.SetTargets(targetNode, "reverted", "node", chaosDetails)
}
@ -198,11 +202,11 @@ func uncordonNode(experimentsDetails *experimentTypes.ExperimentDetails, clients
if apierrors.IsNotFound(err) {
continue
} else {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{node: %s}", targetNode), Reason: err.Error()}
}
}
if nodeSpec.Spec.Unschedulable {
return errors.Errorf("%v node is in unschedulable state", experimentsDetails.TargetNode)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{node: %s}", targetNode), Reason: "target node is in unschedule state"}
}
}
return nil

View File

@ -2,10 +2,16 @@ package lib
import (
"context"
"fmt"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-io-stress/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -13,16 +19,17 @@ import (
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareNodeIOStress contains prepration steps before chaos injection
func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//setup the tunables if provided in range
// PrepareNodeIOStress contains preparation steps before chaos injection
func PrepareNodeIOStress(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeIOStressFault")
defer span.End()
//set up the tunables if provided in range
setChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The details of chaos tunables are:", logrus.Fields{
@ -30,7 +37,7 @@ func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails,
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"CPU Core": experimentsDetails.CPU,
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
"Node Affce Perc": experimentsDetails.NodesAffectedPerc,
"Node Affected Percentage": experimentsDetails.NodesAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
@ -44,7 +51,7 @@ func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails,
nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc)
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node list")
}
log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{
"No. Of Nodes": len(targetNodeList),
@ -53,21 +60,21 @@ func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails,
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not set helper data")
}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -79,13 +86,13 @@ func PrepareNodeIOStress(experimentsDetails *experimentTypes.ExperimentDetails,
}
// injectChaosInSerialMode stress the io of all the target nodes serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeIOStressFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -104,52 +111,45 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
})
experimentsDetails.RunID = common.GetRunID()
experimentsDetails.RunID = stringutils.GetRunID()
// Creating the helper pod to perform node io stress
if err := createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(appNode, "injected", "node", chaosDetails)
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
common.SetTargets(appNode, "reverted", "node", chaosDetails)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
common.SetTargets(appNode, "targeted", "node", chaosDetails)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode stress the io of all the target nodes in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeIOStressFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
experimentsDetails.RunID = stringutils.GetRunID()
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
@ -164,57 +164,37 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
})
experimentsDetails.RunID = common.GetRunID()
// Creating the helper pod to perform node io stress
if err := createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
for _, appNode := range targetNodeList {
common.SetTargets(appNode, "injected", "node", chaosDetails)
common.SetTargets(appNode, "targeted", "node", chaosDetails)
}
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
for _, appNode := range targetNodeList {
common.SetTargets(appNode, "reverted", "node", chaosDetails)
}
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets, labelSuffix string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeIOStressFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
@ -236,8 +216,16 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getContainerArguments derives the args for the pumba stress helper pod
@ -279,8 +267,8 @@ func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails
return stressArgs
}
//setChaosTunables will setup a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value.
// setChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.FilesystemUtilizationBytes = common.ValidateRange(experimentsDetails.FilesystemUtilizationBytes)
experimentsDetails.FilesystemUtilizationPercentage = common.ValidateRange(experimentsDetails.FilesystemUtilizationPercentage)

View File

@ -2,34 +2,41 @@ package lib
import (
"context"
"fmt"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-memory-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PrepareNodeMemoryHog contains prepration steps before chaos injection
func PrepareNodeMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareNodeMemoryHog contains preparation steps before chaos injection
func PrepareNodeMemoryHog(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeMemoryHogFault")
defer span.End()
//setup the tunables if provided in range
//set up the tunables if provided in range
setChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The details of chaos tunables are:", logrus.Fields{
"MemoryConsumptionMebibytes": experimentsDetails.MemoryConsumptionMebibytes,
"MemoryConsumptionPercentage": experimentsDetails.MemoryConsumptionPercentage,
"NumberOfWorkers": experimentsDetails.NumberOfWorkers,
"Node Affce Perc": experimentsDetails.NodesAffectedPerc,
"Node Affected Percentage": experimentsDetails.NodesAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
@ -43,8 +50,9 @@ func PrepareNodeMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails,
nodesAffectedPerc, _ := strconv.Atoi(experimentsDetails.NodesAffectedPerc)
targetNodeList, err := common.GetNodeList(experimentsDetails.TargetNodes, experimentsDetails.NodeLabel, nodesAffectedPerc, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node list")
}
log.InfoWithValues("[Info]: Details of Nodes under chaos injection", logrus.Fields{
"No. Of Nodes": len(targetNodeList),
"Node Names": targetNodeList,
@ -52,21 +60,21 @@ func PrepareNodeMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails,
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not set helper data")
}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetNodeList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -78,13 +86,13 @@ func PrepareNodeMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails,
}
// injectChaosInSerialMode stress the memory of all the target nodes serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeMemoryHogFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -103,68 +111,50 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"Memory Consumption Mebibytes": experimentsDetails.MemoryConsumptionMebibytes,
})
experimentsDetails.RunID = common.GetRunID()
experimentsDetails.RunID = stringutils.GetRunID()
//Getting node memory details
memoryCapacity, memoryAllocatable, err := getNodeMemoryDetails(appNode, clients)
if err != nil {
return errors.Errorf("unable to get the node memory details, err: %v", err)
return stacktrace.Propagate(err, "could not get node memory details")
}
//Getting the exact memory value to exhaust
MemoryConsumption, err := calculateMemoryConsumption(experimentsDetails, clients, memoryCapacity, memoryAllocatable)
MemoryConsumption, err := calculateMemoryConsumption(experimentsDetails, memoryCapacity, memoryAllocatable)
if err != nil {
return errors.Errorf("memory calculation failed, err: %v", err)
return stacktrace.Propagate(err, "could not calculate memory consumption value")
}
// Creating the helper pod to perform node memory hog
if err = createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix, MemoryConsumption); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err = createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients, MemoryConsumption); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
common.SetTargets(appNode, "targeted", "node", chaosDetails)
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
} else if podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod status is %v", podStatus)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode stress the memory all the target nodes in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
labelSuffix := common.GetRunID()
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetNodeList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeMemoryHogFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
experimentsDetails.RunID = stringutils.GetRunID()
for _, appNode := range targetNodeList {
if experimentsDetails.EngineName != "" {
@ -179,54 +169,32 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"Memory Consumption Mebibytes": experimentsDetails.MemoryConsumptionMebibytes,
})
experimentsDetails.RunID = common.GetRunID()
//Getting node memory details
memoryCapacity, memoryAllocatable, err := getNodeMemoryDetails(appNode, clients)
if err != nil {
return errors.Errorf("unable to get the node memory details, err: %v", err)
return stacktrace.Propagate(err, "could not get node memory details")
}
//Getting the exact memory value to exhaust
MemoryConsumption, err := calculateMemoryConsumption(experimentsDetails, clients, memoryCapacity, memoryAllocatable)
MemoryConsumption, err := calculateMemoryConsumption(experimentsDetails, memoryCapacity, memoryAllocatable)
if err != nil {
return errors.Errorf("memory calculation failed, err: %v", err)
return stacktrace.Propagate(err, "could not calculate memory consumption value")
}
// Creating the helper pod to perform node memory hog
if err = createHelperPod(experimentsDetails, chaosDetails, appNode, clients, labelSuffix, MemoryConsumption); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err = createHelperPod(ctx, experimentsDetails, chaosDetails, appNode, clients, MemoryConsumption); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
for _, appNode := range targetNodeList {
common.SetTargets(appNode, "targeted", "node", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
} else if podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod status is %v", podStatus)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
return nil
@ -234,25 +202,23 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
// getNodeMemoryDetails will return the total memory capacity and memory allocatable of an application node
func getNodeMemoryDetails(appNodeName string, clients clients.ClientSets) (int, int, error) {
nodeDetails, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), appNodeName, v1.GetOptions{})
nodeDetails, err := clients.GetNode(appNodeName, 180, 2)
if err != nil {
return 0, 0, err
return 0, 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNodeName), Reason: err.Error()}
}
memoryCapacity := int(nodeDetails.Status.Capacity.Memory().Value())
memoryAllocatable := int(nodeDetails.Status.Allocatable.Memory().Value())
if memoryCapacity == 0 || memoryAllocatable == 0 {
return memoryCapacity, memoryAllocatable, errors.Errorf("failed to get memory details of the application node")
return memoryCapacity, memoryAllocatable, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNodeName), Reason: "failed to get memory details of the target node"}
}
return memoryCapacity, memoryAllocatable, nil
}
// calculateMemoryConsumption will calculate the amount of memory to be consumed for a given unit.
func calculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, memoryCapacity, memoryAllocatable int) (string, error) {
func calculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDetails, memoryCapacity, memoryAllocatable int) (string, error) {
var totalMemoryConsumption int
var MemoryConsumption string
@ -279,12 +245,12 @@ func calculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDe
//Getting the total memory under chaos
memoryConsumptionPercentage, _ := strconv.ParseFloat(experimentsDetails.MemoryConsumptionPercentage, 64)
memoryForChaos := ((memoryConsumptionPercentage / 100) * float64(memoryCapacity))
memoryForChaos := (memoryConsumptionPercentage / 100) * float64(memoryCapacity)
//Get the percentage of memory under chaos wrt allocatable memory
totalMemoryConsumption = int((float64(memoryForChaos) / float64(memoryAllocatable)) * 100)
totalMemoryConsumption = int((memoryForChaos / float64(memoryAllocatable)) * 100)
if totalMemoryConsumption > 100 {
log.Infof("[Info]: PercentageOfMemoryCapacity To Be Used: %d percent, which is more than 100 percent (%d percent) of Allocatable Memory, so the experiment will only consume upto 100 percent of Allocatable Memory", experimentsDetails.MemoryConsumptionPercentage, totalMemoryConsumption)
log.Infof("[Info]: PercentageOfMemoryCapacity To Be Used: %v percent, which is more than 100 percent (%d percent) of Allocatable Memory, so the experiment will only consume upto 100 percent of Allocatable Memory", experimentsDetails.MemoryConsumptionPercentage, totalMemoryConsumption)
MemoryConsumption = "100%"
} else {
log.Infof("[Info]: PercentageOfMemoryCapacity To Be Used: %v percent, which is %d percent of Allocatable Memory", experimentsDetails.MemoryConsumptionPercentage, totalMemoryConsumption)
@ -310,20 +276,22 @@ func calculateMemoryConsumption(experimentsDetails *experimentTypes.ExperimentDe
}
return MemoryConsumption, nil
}
return "", errors.Errorf("please specify the memory consumption value either in percentage or mebibytes in a non-decimal format using respective envs")
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: "specify the memory consumption value either in percentage or mebibytes in a non-decimal format using respective envs"}
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets, labelSuffix, MemoryConsumption string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, appNode string, clients clients.ClientSets, MemoryConsumption string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeMemoryHogFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
@ -352,12 +320,20 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
//setChaosTunables will setup a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value.
// setChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func setChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.MemoryConsumptionMebibytes = common.ValidateRange(experimentsDetails.MemoryConsumptionMebibytes)
experimentsDetails.MemoryConsumptionPercentage = common.ValidateRange(experimentsDetails.MemoryConsumptionPercentage)

View File

@ -6,19 +6,21 @@ import (
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-restart/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
corev1 "k8s.io/kubernetes/pkg/apis/core"
)
var err error
@ -32,17 +34,20 @@ const (
privateKeySecret string = "private-key-cm-"
emptyDirVolume string = "empty-dir-"
ObjectNameField = "metadata.name"
)
// PrepareNodeRestart contains preparation steps before chaos injection
func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareNodeRestart(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeRestartFault")
defer span.End()
//Select the node
if experimentsDetails.TargetNode == "" {
//Select node for node-restart
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node name")
}
}
@ -50,7 +55,7 @@ func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, c
if experimentsDetails.TargetNodeIP == "" {
experimentsDetails.TargetNodeIP, err = getInternalIP(experimentsDetails.TargetNode, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get internal ip")
}
}
@ -59,8 +64,7 @@ func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, c
"Target Node IP": experimentsDetails.TargetNodeIP,
})
experimentsDetails.RunID = common.GetRunID()
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID
experimentsDetails.RunID = stringutils.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -79,39 +83,19 @@ func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, c
}
// Creating the helper pod to perform node restart
if err = createHelperPod(experimentsDetails, chaosDetails, clients); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err = createHelperPod(ctx, experimentsDetails, chaosDetails, clients); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err = status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
if err := common.CheckHelperStatusAndRunProbes(ctx, appLabel, experimentsDetails.TargetNode, chaosDetails, clients, resultDetails, eventsDetails); err != nil {
return err
}
common.SetTargets(experimentsDetails.TargetNode, "targeted", "node", chaosDetails)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return err
}
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.WaitForCompletionAndDeleteHelperPods(appLabel, chaosDetails, clients, false); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
@ -119,14 +103,17 @@ func PrepareNodeRestart(experimentsDetails *experimentTypes.ExperimentDetails, c
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", strconv.Itoa(experimentsDetails.RampTime))
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, clients clients.ClientSets) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, chaosDetails *types.ChaosDetails, clients clients.ClientSets) error {
// This method is attaching emptyDir along with secret volume, and copy data from secret
// to the emptyDir, because secret is mounted as readonly and with 777 perms and it can't be changed
// because of: https://github.com/kubernetes/kubernetes/issues/57923
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreateNodeRestartFaultHelperPod")
defer span.End()
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
@ -134,7 +121,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + experimentsDetails.RunID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, "", experimentsDetails.ExperimentName),
Labels: common.GetHelperLabels(chaosDetails.Labels, experimentsDetails.RunID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
@ -148,7 +135,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
{
MatchFields: []apiv1.NodeSelectorRequirement{
{
Key: corev1.ObjectNameField,
Key: ObjectNameField,
Operator: apiv1.NodeSelectorOpNotIn,
Values: []string{experimentsDetails.TargetNode},
},
@ -199,20 +186,28 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, chao
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getInternalIP gets the internal ip of the given node
func getInternalIP(nodeName string, clients clients.ClientSets) (string, error) {
node, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), nodeName, v1.GetOptions{})
node, err := clients.GetNode(nodeName, 180, 2)
if err != nil {
return "", err
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", nodeName), Reason: err.Error()}
}
for _, addr := range node.Status.Addresses {
if strings.ToLower(string(addr.Type)) == "internalip" {
return addr.Address, nil
}
}
return "", errors.Errorf("unable to find the internal ip of the %v node", nodeName)
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", nodeName), Reason: "failed to get the internal ip of the target node"}
}

View File

@ -2,13 +2,19 @@ package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-taint/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -16,9 +22,7 @@ import (
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var (
@ -26,8 +30,10 @@ var (
inject, abort chan os.Signal
)
//PrepareNodeTaint contains the prepration steps before chaos injection
func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareNodeTaint contains the preparation steps before chaos injection
func PrepareNodeTaint(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareNodeTaintFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -49,7 +55,7 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
//Select node for kubelet-service-kill
experimentsDetails.TargetNode, err = common.GetNodeName(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.NodeLabel, clients)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get node name")
}
}
@ -61,7 +67,7 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -70,21 +76,28 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
go abortWatcher(experimentsDetails, clients, resultDetails, chaosDetails, eventsDetails)
// taint the application node
if err := taintNode(experimentsDetails, clients, chaosDetails); err != nil {
return err
if err := taintNode(ctx, experimentsDetails, clients, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not taint node")
}
// Verify the status of AUT after reschedule
log.Info("[Status]: Verify the status of AUT after reschedule")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return errors.Errorf("application status check failed, err: %v", err)
if err = status.AUTStatusCheck(clients, chaosDetails); err != nil {
log.Info("[Revert]: Reverting chaos because application status check failed")
if taintErr := removeTaintFromNode(experimentsDetails, clients, chaosDetails); taintErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(taintErr).Error())}
}
return err
}
// Verify the status of Auxiliary Applications after reschedule
if experimentsDetails.AuxiliaryAppInfo != "" {
log.Info("[Status]: Verify that the Auxiliary Applications are running")
if err = status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return errors.Errorf("auxiliary Applications status check failed, err: %v", err)
log.Info("[Revert]: Reverting chaos because auxiliary application status check failed")
if taintErr := removeTaintFromNode(experimentsDetails, clients, chaosDetails); taintErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(taintErr).Error())}
}
return err
}
}
@ -96,7 +109,7 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
// remove taint from the application node
if err := removeTaintFromNode(experimentsDetails, clients, chaosDetails); err != nil {
return err
return stacktrace.Propagate(err, "could not remove taint from node")
}
//Waiting for the ramp time after chaos injection
@ -108,7 +121,9 @@ func PrepareNodeTaint(experimentsDetails *experimentTypes.ExperimentDetails, cli
}
// taintNode taint the application node
func taintNode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
func taintNode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectNodeTaintFault")
defer span.End()
// get the taint labels & effect
taintKey, taintValue, taintEffect := getTaintDetails(experimentsDetails)
@ -116,9 +131,9 @@ func taintNode(experimentsDetails *experimentTypes.ExperimentDetails, clients cl
log.Infof("Add %v taints to the %v node", taintKey+"="+taintValue+":"+taintEffect, experimentsDetails.TargetNode)
// get the node details
node, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), experimentsDetails.TargetNode, v1.GetOptions{})
if err != nil || node == nil {
return errors.Errorf("failed to get %v node, err: %v", experimentsDetails.TargetNode, err)
node, err := clients.GetNode(experimentsDetails.TargetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{nodeName: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
}
// check if the taint already exists
@ -142,9 +157,8 @@ func taintNode(experimentsDetails *experimentTypes.ExperimentDetails, clients cl
Effect: apiv1.TaintEffect(taintEffect),
})
updatedNodeWithTaint, err := clients.KubeClient.CoreV1().Nodes().Update(context.Background(), node, v1.UpdateOptions{})
if err != nil || updatedNodeWithTaint == nil {
return errors.Errorf("failed to update %v node after adding taints, err: %v", experimentsDetails.TargetNode, err)
if err := clients.UpdateNode(chaosDetails, node); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{nodeName: %s}", node.Name), Reason: fmt.Sprintf("failed to add taints: %s", err.Error())}
}
}
@ -163,9 +177,9 @@ func removeTaintFromNode(experimentsDetails *experimentTypes.ExperimentDetails,
taintKey := strings.Split(taintLabel[0], "=")[0]
// get the node details
node, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), experimentsDetails.TargetNode, v1.GetOptions{})
if err != nil || node == nil {
return errors.Errorf("failed to get %v node, err: %v", experimentsDetails.TargetNode, err)
node, err := clients.GetNode(experimentsDetails.TargetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{nodeName: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
}
// check if the taint already exists
@ -178,17 +192,16 @@ func removeTaintFromNode(experimentsDetails *experimentTypes.ExperimentDetails,
}
if tainted {
var Newtaints []apiv1.Taint
var newTaints []apiv1.Taint
// remove all the taints with matching key
for _, taint := range node.Spec.Taints {
if taint.Key != taintKey {
Newtaints = append(Newtaints, taint)
newTaints = append(newTaints, taint)
}
}
node.Spec.Taints = Newtaints
updatedNodeWithTaint, err := clients.KubeClient.CoreV1().Nodes().Update(context.Background(), node, v1.UpdateOptions{})
if err != nil || updatedNodeWithTaint == nil {
return errors.Errorf("failed to update %v node after removing taints, err: %v", experimentsDetails.TargetNode, err)
node.Spec.Taints = newTaints
if err := clients.UpdateNode(chaosDetails, node); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{nodeName: %s}", node.Name), Reason: fmt.Sprintf("failed to remove taints: %s", err.Error())}
}
}

View File

@ -2,16 +2,22 @@ package lib
import (
"context"
"math"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-autoscaler/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/math"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
@ -20,8 +26,6 @@ import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
appsv1 "k8s.io/client-go/kubernetes/typed/apps/v1"
retries "k8s.io/client-go/util/retry"
"github.com/pkg/errors"
)
var (
@ -30,8 +34,10 @@ var (
appsv1StatefulsetClient appsv1.StatefulSetInterface
)
//PreparePodAutoscaler contains the prepration steps and chaos injection steps
func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PreparePodAutoscaler contains the preparation steps and chaos injection steps
func PreparePodAutoscaler(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodAutoscalerFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -46,9 +52,9 @@ func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails,
switch strings.ToLower(experimentsDetails.AppKind) {
case "deployment", "deployments":
appsUnderTest, err := getDeploymentDetails(experimentsDetails, clients)
appsUnderTest, err := getDeploymentDetails(experimentsDetails)
if err != nil {
return errors.Errorf("fail to get the name & initial replica count of the deployment, err: %v", err)
return stacktrace.Propagate(err, "could not get deployment details")
}
deploymentList := []string{}
@ -63,22 +69,22 @@ func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails,
//calling go routine which will continuously watch for the abort signal
go abortPodAutoScalerChaos(appsUnderTest, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails)
if err = podAutoscalerChaosInDeployment(experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return errors.Errorf("fail to perform autoscaling, err: %v", err)
if err = podAutoscalerChaosInDeployment(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not scale deployment")
}
if err = autoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
return errors.Errorf("fail to rollback the autoscaling, err: %v", err)
return stacktrace.Propagate(err, "could not revert scaling in deployment")
}
case "statefulset", "statefulsets":
appsUnderTest, err := getStatefulsetDetails(experimentsDetails, clients)
appsUnderTest, err := getStatefulsetDetails(experimentsDetails)
if err != nil {
return errors.Errorf("fail to get the name & initial replica count of the statefulset, err: %v", err)
return stacktrace.Propagate(err, "could not get statefulset details")
}
stsList := []string{}
var stsList []string
for _, sts := range appsUnderTest {
stsList = append(stsList, sts.AppName)
}
@ -90,16 +96,16 @@ func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails,
//calling go routine which will continuously watch for the abort signal
go abortPodAutoScalerChaos(appsUnderTest, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails)
if err = podAutoscalerChaosInStatefulset(experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return errors.Errorf("fail to perform autoscaling, err: %v", err)
if err = podAutoscalerChaosInStatefulset(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not scale statefulset")
}
if err = autoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
return errors.Errorf("fail to rollback the autoscaling, err: %v", err)
return stacktrace.Propagate(err, "could not revert scaling in statefulset")
}
default:
return errors.Errorf("application type '%s' is not supported for the chaos", experimentsDetails.AppKind)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{kind: %s}", experimentsDetails.AppKind), Reason: "application type is not supported"}
}
//Waiting for the ramp time after chaos injection
@ -110,38 +116,38 @@ func PreparePodAutoscaler(experimentsDetails *experimentTypes.ExperimentDetails,
return nil
}
func getSliceOfTotalApplicationsTargeted(appList []experimentTypes.ApplicationUnderTest, experimentsDetails *experimentTypes.ExperimentDetails) ([]experimentTypes.ApplicationUnderTest, error) {
func getSliceOfTotalApplicationsTargeted(appList []experimentTypes.ApplicationUnderTest, experimentsDetails *experimentTypes.ExperimentDetails) []experimentTypes.ApplicationUnderTest {
slice := int(math.Round(float64(len(appList)*experimentsDetails.AppAffectPercentage) / float64(100)))
if slice < 0 || slice > len(appList) {
return nil, errors.Errorf("slice of applications to target out of range %d/%d", slice, len(appList))
}
return appList[:slice], nil
newAppListLength := math.Maximum(1, math.Adjustment(math.Minimum(experimentsDetails.AppAffectPercentage, 100), len(appList)))
return appList[:newAppListLength]
}
//getDeploymentDetails is used to get the name and total number of replicas of the deployment
func getDeploymentDetails(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) ([]experimentTypes.ApplicationUnderTest, error) {
// getDeploymentDetails is used to get the name and total number of replicas of the deployment
func getDeploymentDetails(experimentsDetails *experimentTypes.ExperimentDetails) ([]experimentTypes.ApplicationUnderTest, error) {
deploymentList, err := appsv1DeploymentClient.List(context.Background(), metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
if err != nil || len(deploymentList.Items) == 0 {
return nil, errors.Errorf("fail to get the deployments with matching labels, err: %v", err)
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: deployment, labels: %s}", experimentsDetails.AppLabel), Reason: err.Error()}
} else if len(deploymentList.Items) == 0 {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: deployment, labels: %s}", experimentsDetails.AppLabel), Reason: "no deployment found with matching labels"}
}
appsUnderTest := []experimentTypes.ApplicationUnderTest{}
var appsUnderTest []experimentTypes.ApplicationUnderTest
for _, app := range deploymentList.Items {
log.Infof("[Info]: Found deployment name '%s' with replica count '%d'", app.Name, int(*app.Spec.Replicas))
appsUnderTest = append(appsUnderTest, experimentTypes.ApplicationUnderTest{AppName: app.Name, ReplicaCount: int(*app.Spec.Replicas)})
}
// Applying the APP_AFFECT_PERC variable to determine the total target deployments to scale
return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails)
// Applying the APP_AFFECTED_PERC variable to determine the total target deployments to scale
return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails), nil
}
//getStatefulsetDetails is used to get the name and total number of replicas of the statefulsets
func getStatefulsetDetails(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) ([]experimentTypes.ApplicationUnderTest, error) {
// getStatefulsetDetails is used to get the name and total number of replicas of the statefulsets
func getStatefulsetDetails(experimentsDetails *experimentTypes.ExperimentDetails) ([]experimentTypes.ApplicationUnderTest, error) {
statefulsetList, err := appsv1StatefulsetClient.List(context.Background(), metav1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
if err != nil || len(statefulsetList.Items) == 0 {
return nil, errors.Errorf("fail to get the statefulsets with matching labels, err: %v", err)
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: statefulset, labels: %s}", experimentsDetails.AppLabel), Reason: err.Error()}
} else if len(statefulsetList.Items) == 0 {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Target: fmt.Sprintf("{kind: statefulset, labels: %s}", experimentsDetails.AppLabel), Reason: "no statefulset found with matching labels"}
}
appsUnderTest := []experimentTypes.ApplicationUnderTest{}
@ -150,11 +156,11 @@ func getStatefulsetDetails(experimentsDetails *experimentTypes.ExperimentDetails
appsUnderTest = append(appsUnderTest, experimentTypes.ApplicationUnderTest{AppName: app.Name, ReplicaCount: int(*app.Spec.Replicas)})
}
// Applying the APP_AFFECT_PERC variable to determine the total target deployments to scale
return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails)
return getSliceOfTotalApplicationsTargeted(appsUnderTest, experimentsDetails), nil
}
//podAutoscalerChaosInDeployment scales up the replicas of deployment and verify the status
func podAutoscalerChaosInDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// podAutoscalerChaosInDeployment scales up the replicas of deployment and verify the status
func podAutoscalerChaosInDeployment(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Scale Application
retryErr := retries.RetryOnConflict(retries.DefaultRetry, func() error {
@ -163,33 +169,29 @@ func podAutoscalerChaosInDeployment(experimentsDetails *experimentTypes.Experime
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
appUnderTest, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("fail to get latest version of application deployment, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: err.Error()}
}
// modifying the replica count
appUnderTest.Spec.Replicas = int32Ptr(int32(experimentsDetails.Replicas))
log.Infof("Updating deployment '%s' to number of replicas '%d'", appUnderTest.ObjectMeta.Name, experimentsDetails.Replicas)
_, err = appsv1DeploymentClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to scale deployment :%s", err.Error())}
}
common.SetTargets(app.AppName, "injected", "deployment", chaosDetails)
}
return nil
})
if retryErr != nil {
return errors.Errorf("fail to update the replica count of the deployment, err: %v", retryErr)
return retryErr
}
log.Info("[Info]: The application started scaling")
if err = deploymentStatusCheck(experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return errors.Errorf("application deployment status check failed, err: %v", err)
}
return nil
return deploymentStatusCheck(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails)
}
//podAutoscalerChaosInStatefulset scales up the replicas of statefulset and verify the status
func podAutoscalerChaosInStatefulset(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// podAutoscalerChaosInStatefulset scales up the replicas of statefulset and verify the status
func podAutoscalerChaosInStatefulset(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Scale Application
retryErr := retries.RetryOnConflict(retries.DefaultRetry, func() error {
@ -198,36 +200,31 @@ func podAutoscalerChaosInStatefulset(experimentsDetails *experimentTypes.Experim
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
appUnderTest, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("fail to get latest version of the target statefulset application , err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: err.Error()}
}
// modifying the replica count
appUnderTest.Spec.Replicas = int32Ptr(int32(experimentsDetails.Replicas))
_, err = appsv1StatefulsetClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to scale statefulset :%s", err.Error())}
}
common.SetTargets(app.AppName, "injected", "statefulset", chaosDetails)
}
return nil
})
if retryErr != nil {
return errors.Errorf("fail to update the replica count of the statefulset application, err: %v", retryErr)
return retryErr
}
log.Info("[Info]: The application started scaling")
if err = statefulsetStatusCheck(experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails); err != nil {
return errors.Errorf("statefulset application status check failed, err: %v", err)
}
return nil
return statefulsetStatusCheck(ctx, experimentsDetails, clients, appsUnderTest, resultDetails, eventsDetails, chaosDetails)
}
// deploymentStatusCheck check the status of deployment and verify the available replicas
func deploymentStatusCheck(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func deploymentStatusCheck(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
isFailed := false
err = retry.
Times(uint(experimentsDetails.ChaosDuration / experimentsDetails.Delay)).
@ -236,33 +233,29 @@ func deploymentStatusCheck(experimentsDetails *experimentTypes.ExperimentDetails
for _, app := range appsUnderTest {
deployment, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("fail to find the deployment with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
if int(deployment.Status.ReadyReplicas) != experimentsDetails.Replicas {
isFailed = true
return errors.Errorf("application %s is not scaled yet, the desired replica count is: %v and ready replica count is: %v", app.AppName, experimentsDetails.Replicas, deployment.Status.ReadyReplicas)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to scale deployment, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, deployment.Status.ReadyReplicas)}
}
}
isFailed = false
return nil
})
if isFailed {
if err = autoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
return errors.Errorf("fail to perform the autoscaler recovery of the deployment, err: %v", err)
}
return errors.Errorf("fail to scale the deployment to the desired replica count in the given chaos duration")
}
if err != nil {
return err
if scaleErr := autoscalerRecoveryInDeployment(experimentsDetails, clients, appsUnderTest, chaosDetails); scaleErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(scaleErr).Error())}
}
return stacktrace.Propagate(err, "failed to scale replicas")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
if duration < experimentsDetails.ChaosDuration {
log.Info("[Wait]: Waiting for completion of chaos duration")
@ -273,11 +266,10 @@ func deploymentStatusCheck(experimentsDetails *experimentTypes.ExperimentDetails
}
// statefulsetStatusCheck check the status of statefulset and verify the available replicas
func statefulsetStatusCheck(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func statefulsetStatusCheck(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
isFailed := false
err = retry.
Times(uint(experimentsDetails.ChaosDuration / experimentsDetails.Delay)).
@ -286,30 +278,25 @@ func statefulsetStatusCheck(experimentsDetails *experimentTypes.ExperimentDetail
for _, app := range appsUnderTest {
statefulset, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("fail to find the statefulset with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
if int(statefulset.Status.ReadyReplicas) != experimentsDetails.Replicas {
isFailed = true
return errors.Errorf("application %s is not scaled yet, the desired replica count is: %v and ready replica count is: %v", app.AppName, experimentsDetails.Replicas, statefulset.Status.ReadyReplicas)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to scale statefulset, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, statefulset.Status.ReadyReplicas)}
}
}
isFailed = false
return nil
})
if isFailed {
if err = autoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest, chaosDetails); err != nil {
return errors.Errorf("fail to perform the autoscaler recovery of the application, err: %v", err)
}
return errors.Errorf("fail to scale the application to the desired replica count in the given chaos duration")
}
if err != nil {
return err
if scaleErr := autoscalerRecoveryInStatefulset(experimentsDetails, clients, appsUnderTest, chaosDetails); scaleErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(scaleErr).Error())}
}
return stacktrace.Propagate(err, "failed to scale replicas")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -323,7 +310,7 @@ func statefulsetStatusCheck(experimentsDetails *experimentTypes.ExperimentDetail
return nil
}
//autoscalerRecoveryInDeployment rollback the replicas to initial values in deployment
// autoscalerRecoveryInDeployment rollback the replicas to initial values in deployment
func autoscalerRecoveryInDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, chaosDetails *types.ChaosDetails) error {
// Scale back to initial number of replicas
@ -333,20 +320,20 @@ func autoscalerRecoveryInDeployment(experimentsDetails *experimentTypes.Experime
for _, app := range appsUnderTest {
appUnderTest, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("fail to find the latest version of Application Deployment with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
appUnderTest.Spec.Replicas = int32Ptr(int32(app.ReplicaCount)) // modify replica count
_, err = appsv1DeploymentClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to revert scaling in deployment :%s", err.Error())}
}
common.SetTargets(app.AppName, "reverted", "deployment", chaosDetails)
}
return nil
})
if retryErr != nil {
return errors.Errorf("fail to rollback the deployment, err: %v", retryErr)
return retryErr
}
log.Info("[Info]: Application started rolling back to original replica count")
@ -357,11 +344,11 @@ func autoscalerRecoveryInDeployment(experimentsDetails *experimentTypes.Experime
for _, app := range appsUnderTest {
applicationDeploy, err := appsv1DeploymentClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("fail to find the deployment with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
if int(applicationDeploy.Status.ReadyReplicas) != app.ReplicaCount {
log.Infof("[Info]: Application ready replica count is: %v", applicationDeploy.Status.ReadyReplicas)
return errors.Errorf("fail to rollback to original replica count, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: deployment, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to rollback deployment scaling, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, applicationDeploy.Status.ReadyReplicas)}
}
}
log.Info("[RollBack]: Application rollback to the initial number of replicas")
@ -369,7 +356,7 @@ func autoscalerRecoveryInDeployment(experimentsDetails *experimentTypes.Experime
})
}
//autoscalerRecoveryInStatefulset rollback the replicas to initial values in deployment
// autoscalerRecoveryInStatefulset rollback the replicas to initial values in deployment
func autoscalerRecoveryInStatefulset(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, appsUnderTest []experimentTypes.ApplicationUnderTest, chaosDetails *types.ChaosDetails) error {
// Scale back to initial number of replicas
@ -379,20 +366,20 @@ func autoscalerRecoveryInStatefulset(experimentsDetails *experimentTypes.Experim
// RetryOnConflict uses exponential backoff to avoid exhausting the apiserver
appUnderTest, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("failed to find the latest version of Statefulset with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
appUnderTest.Spec.Replicas = int32Ptr(int32(app.ReplicaCount)) // modify replica count
_, err = appsv1StatefulsetClient.Update(context.Background(), appUnderTest, metav1.UpdateOptions{})
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, name: %s, namespace: %s}", app.AppName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to revert scaling in statefulset :%s", err.Error())}
}
common.SetTargets(app.AppName, "reverted", "statefulset", chaosDetails)
}
return nil
})
if retryErr != nil {
return errors.Errorf("fail to rollback the statefulset, err: %v", retryErr)
return retryErr
}
log.Info("[Info]: Application pod started rolling back")
@ -403,11 +390,11 @@ func autoscalerRecoveryInStatefulset(experimentsDetails *experimentTypes.Experim
for _, app := range appsUnderTest {
applicationDeploy, err := appsv1StatefulsetClient.Get(context.Background(), app.AppName, metav1.GetOptions{})
if err != nil {
return errors.Errorf("fail to get the statefulset with name %v, err: %v", app.AppName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: err.Error()}
}
if int(applicationDeploy.Status.ReadyReplicas) != app.ReplicaCount {
log.Infof("Application ready replica count is: %v", applicationDeploy.Status.ReadyReplicas)
return errors.Errorf("fail to roll back to original replica count, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{kind: statefulset, namespace: %s, name: %s}", experimentsDetails.AppNS, app.AppName), Reason: fmt.Sprintf("failed to rollback statefulset scaling, the desired replica count is: %v and ready replica count is: %v", experimentsDetails.Replicas, applicationDeploy.Status.ReadyReplicas)}
}
}
log.Info("[RollBack]: Application roll back to initial number of replicas")
@ -417,7 +404,7 @@ func autoscalerRecoveryInStatefulset(experimentsDetails *experimentTypes.Experim
func int32Ptr(i int32) *int32 { return &i }
//abortPodAutoScalerChaos go routine will continuously watch for the abort signal for the entire chaos duration and generate the required events and result
// abortPodAutoScalerChaos go routine will continuously watch for the abort signal for the entire chaos duration and generate the required events and result
func abortPodAutoScalerChaos(appsUnderTest []experimentTypes.ApplicationUnderTest, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) {
// signChan channel is used to transmit signal notifications.

View File

@ -1,13 +1,20 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-cpu-hog-exec/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -16,36 +23,61 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
)
var inject chan os.Signal
// PrepareCPUExecStress contains the chaos preparation and injection steps
func PrepareCPUExecStress(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodCPUHogExecFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the CPU stress experiment
if err := experimentCPU(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not stress cpu")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// stressCPU Uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the CPU utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
func stressCPU(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets, stressErr chan error) {
// It will contains all the pod & container details required for exec command
func stressCPU(experimentsDetails *experimentTypes.ExperimentDetails, podName, ns string, clients clients.ClientSets, stressErr chan error) {
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", experimentsDetails.ChaosInjectCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, experimentsDetails.AppNS)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, ns)
_, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err
}
//experimentCPU function orchestrates the experiment by calling the StressCPU function for every core, of every container, of every pod that is targeted
func experimentCPU(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// experimentCPU function orchestrates the experiment by calling the StressCPU function for every core, of every container, of every pod that is targeted
func experimentCPU(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
@ -54,30 +86,31 @@ func experimentCPU(experimentsDetails *experimentTypes.ExperimentDetails, client
}
log.Infof("Target pods list for chaos, %v", podNames)
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "")
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode stressed the cpu of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodCPUHogExecFaultInSerialMode")
defer span.End()
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -109,10 +142,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
@ -122,7 +152,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
})
for i := 0; i < experimentsDetails.CPUcores; i++ {
go stressCPU(experimentsDetails, pod.Name, clients, stressErr)
go stressCPU(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
}
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
@ -142,18 +172,20 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Warn("Chaos process OOM killed")
return nil
}
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("podName: %s, namespace: %s, container: %s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: fmt.Sprintf("failed to stress cpu of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
err := killStressCPUSerial(experimentsDetails, pod.Name, clients, chaosDetails)
if err != nil {
if err := killStressCPUSerial(experimentsDetails, pod.Name, pod.Namespace, clients, chaosDetails); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
@ -162,8 +194,8 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
break loop
}
}
if err := killStressCPUSerial(experimentsDetails, pod.Name, clients, chaosDetails); err != nil {
return err
if err := killStressCPUSerial(experimentsDetails, pod.Name, pod.Namespace, clients, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not revert cpu stress")
}
}
}
@ -171,13 +203,16 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode stressed the cpu of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodCPUHogExecFaultInParallelMode")
defer span.End()
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -205,10 +240,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
@ -217,7 +249,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"CPU CORE": experimentsDetails.CPUcores,
})
for i := 0; i < experimentsDetails.CPUcores; i++ {
go stressCPU(experimentsDetails, pod.Name, clients, stressErr)
go stressCPU(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
}
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
}
@ -238,7 +270,7 @@ loop:
log.Warn("Chaos process OOM killed")
return nil
}
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to stress cpu of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
@ -246,9 +278,12 @@ loop:
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
@ -260,43 +295,19 @@ loop:
return killStressCPUParallel(experimentsDetails, targetPodList, clients, chaosDetails)
}
//PrepareCPUExecStress contains the chaos prepration and injection steps
func PrepareCPUExecStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the CPU stress experiment
if err := experimentCPU(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// killStressCPUSerial function to kill a stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressCPUSerial(experimentsDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// It will contains all the pod & container details required for exec command
//
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressCPUSerial(experimentsDetails *experimentTypes.ExperimentDetails, podName, ns string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", experimentsDetails.ChaosKillCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, experimentsDetails.AppNS)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, ns)
out, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return errors.Errorf("Unable to kill the stress process in %v pod, err: %v", podName, err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, ns), Reason: fmt.Sprintf("failed to revert chaos: %s", out)}
}
common.SetTargets(podName, "reverted", "pod", chaosDetails)
return nil
@ -305,12 +316,14 @@ func killStressCPUSerial(experimentsDetails *experimentTypes.ExperimentDetails,
// killStressCPUParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressCPUParallel(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
var errList []string
for _, pod := range targetPodList.Items {
if err := killStressCPUSerial(experimentsDetails, pod.Name, clients, chaosDetails); err != nil {
return err
if err := killStressCPUSerial(experimentsDetails, pod.Name, pod.Namespace, clients, chaosDetails); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}

View File

@ -2,27 +2,32 @@ package lib
import (
"context"
"fmt"
"strconv"
"strings"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-delete/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/annotation"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/workloads"
"github.com/palantir/stacktrace"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
"go.opentelemetry.io/otel"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PreparePodDelete contains the preparation steps before chaos injection
func PreparePodDelete(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodDeleteFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -30,7 +35,7 @@ func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, cli
common.WaitForDuration(experimentsDetails.RampTime)
}
//setup the tunables if provided in range
//set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
@ -40,15 +45,15 @@ func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, cli
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return err
if err := injectChaosInSerialMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return err
if err := injectChaosInParallelMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -60,14 +65,13 @@ func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, cli
}
// injectChaosInSerialMode delete the target application pods serial mode(one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDeleteFaultInSerialMode")
defer span.End()
targetPodList := apiv1.PodList{}
var err error
var podsAffectedPerc int
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -80,49 +84,26 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
for duration < experimentsDetails.ChaosDuration {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
podsAffectedPerc, _ = strconv.Atoi(experimentsDetails.PodsAffectedPerc)
if experimentsDetails.NodeLabel == "" {
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
} else {
if experimentsDetails.TargetPods == "" {
targetPodList, err = common.GetPodListFromSpecifiedNodes(experimentsDetails.TargetPods, podsAffectedPerc, experimentsDetails.NodeLabel, clients, chaosDetails)
if err != nil {
return err
}
} else {
log.Infof("TARGET_PODS env is provided, overriding the NODE_LABEL input")
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
}
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
// deriving the parent name of the target resources
if chaosDetails.AppDetail.Kind != "" {
for _, pod := range targetPodList.Items {
parentName, err := annotation.GetParentName(clients, pod, chaosDetails)
if err != nil {
return err
}
common.SetParentName(parentName, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target, "targeted", chaosDetails.AppDetail.Kind, chaosDetails)
}
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return stacktrace.Propagate(err, "could not get pod owner name and kind")
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
}
log.Infof("Target pods list: %v", podNames)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
@ -137,18 +118,18 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"PodName": pod.Name})
if experimentsDetails.Force {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
}
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
}
switch chaosDetails.Randomness {
case true:
if err := common.RandomInterval(experimentsDetails.ChaosInterval); err != nil {
return err
return stacktrace.Propagate(err, "could not get random chaos interval")
}
default:
//Waiting for the chaos interval after chaos injection
@ -161,8 +142,15 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return err
for _, parent := range chaosDetails.ParentsResources {
target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
@ -176,14 +164,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode delete the target application pods in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDeleteFaultInParallelMode")
defer span.End()
targetPodList := apiv1.PodList{}
var err error
var podsAffectedPerc int
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -196,49 +183,25 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
for duration < experimentsDetails.ChaosDuration {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or TARGET_PODS"}
}
podsAffectedPerc, _ = strconv.Atoi(experimentsDetails.PodsAffectedPerc)
if experimentsDetails.NodeLabel == "" {
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
} else {
if experimentsDetails.TargetPods == "" {
targetPodList, err = common.GetPodListFromSpecifiedNodes(experimentsDetails.TargetPods, podsAffectedPerc, experimentsDetails.NodeLabel, clients, chaosDetails)
if err != nil {
return err
}
} else {
log.Infof("TARGET_PODS env is provided, overriding the NODE_LABEL input")
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
}
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
// deriving the parent name of the target resources
if chaosDetails.AppDetail.Kind != "" {
for _, pod := range targetPodList.Items {
parentName, err := annotation.GetParentName(clients, pod, chaosDetails)
if err != nil {
return err
}
common.SetParentName(parentName, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target, "targeted", chaosDetails.AppDetail.Kind, chaosDetails)
}
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
kind, parentName, err := workloads.GetPodOwnerTypeAndName(&pod, clients.DynamicClient)
if err != nil {
return stacktrace.Propagate(err, "could not get pod owner name and kind")
}
common.SetParentName(parentName, kind, pod.Namespace, chaosDetails)
}
for _, target := range chaosDetails.ParentsResources {
common.SetTargets(target.Name, "targeted", target.Kind, chaosDetails)
}
log.Infof("Target pods list: %v", podNames)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
@ -253,19 +216,19 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"PodName": pod.Name})
if experimentsDetails.Force {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{GracePeriodSeconds: &GracePeriod})
} else {
err = clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
err = clients.KubeClient.CoreV1().Pods(pod.Namespace).Delete(context.Background(), pod.Name, v1.DeleteOptions{})
}
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to delete the target pod: %s", err.Error())}
}
}
switch chaosDetails.Randomness {
case true:
if err := common.RandomInterval(experimentsDetails.ChaosInterval); err != nil {
return err
return stacktrace.Propagate(err, "could not get random chaos interval")
}
default:
//Waiting for the chaos interval after chaos injection
@ -278,8 +241,15 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Verify the status of pod after the chaos injection
log.Info("[Status]: Verification for the recreation of application pod")
if err = status.CheckApplicationStatus(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return err
for _, parent := range chaosDetails.ParentsResources {
target := types.AppDetails{
Names: []string{parent.Name},
Kind: parent.Kind,
Namespace: parent.Namespace,
}
if err = status.CheckUnTerminatedPodStatusesByWorkloadName(target, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not check pod statuses by workload names")
}
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
@ -289,8 +259,8 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
return nil
}
//SetChaosTunables will setup a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value.
// SetChaosTunables will setup a random value within a given range of values
// If the value is not provided in range it'll setup the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.PodsAffectedPerc = common.ValidateRange(experimentsDetails.PodsAffectedPerc)
experimentsDetails.Sequence = common.GetRandomSequence(experimentsDetails.Sequence)

View File

@ -2,8 +2,12 @@ package helper
import (
"bytes"
"context"
"fmt"
"github.com/kyokomi/emoji"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"os"
"os/exec"
"os/signal"
@ -24,6 +28,7 @@ import (
var (
abort, injectAbort chan os.Signal
err error
)
const (
@ -32,7 +37,9 @@ const (
)
// Helper injects the dns chaos
func Helper(clients clients.ClientSets) {
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodDNSFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
@ -63,23 +70,70 @@ func Helper(clients clients.ClientSets) {
result.SetResultUID(&resultDetails, clients, &chaosDetails)
if err := preparePodDNSChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
//preparePodDNSChaos contains the preparation steps before chaos injection
// preparePodDNSChaos contains the preparation steps before chaos injection
func preparePodDNSChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
containerID, err := common.GetContainerID(experimentsDetails.AppNS, experimentsDetails.TargetPods, experimentsDetails.TargetContainer, clients)
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return err
return stacktrace.Propagate(err, "could not parse targets")
}
// extract out the pid of the target container
pid, err := common.GetPID(experimentsDetails.ContainerRuntime, containerID, experimentsDetails.SocketPath)
if err != nil {
return err
var targets []targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
}
td.ContainerId, err = common.GetContainerID(td.Namespace, td.Name, td.TargetContainer, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the pid of the target container
td.Pid, err = common.GetPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
}
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos if an abort signal is received
go abortWatcher(targets, resultDetails.Name, chaosDetails.ChaosNamespace)
select {
case <-injectAbort:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
done := make(chan error, 1)
for index, t := range targets {
targets[index].Cmd, err = injectChaos(experimentsDetails, t)
if err != nil {
return stacktrace.Propagate(err, "could not inject chaos")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := terminateProcess(t); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
// record the event inside chaosengine
@ -89,91 +143,136 @@ func preparePodDNSChaos(experimentsDetails *experimentTypes.ExperimentDetails, c
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// prepare dns interceptor
commandTemplate := fmt.Sprintf("sudo TARGET_PID=%d CHAOS_TYPE=%s SPOOF_MAP='%s' TARGET_HOSTNAMES='%s' CHAOS_DURATION=%d MATCH_SCHEME=%s nsutil -p -n -t %d -- dns_interceptor", pid, experimentsDetails.ChaosType, experimentsDetails.SpoofMap, experimentsDetails.TargetHostNames, experimentsDetails.ChaosDuration, experimentsDetails.MatchScheme, pid)
cmd := exec.Command("/bin/bash", "-c", commandTemplate)
log.Info(cmd.String())
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
// injecting dns chaos inside target container
log.Info("[Wait]: Waiting for chaos completion")
// channel to check the completion of the stress process
go func() {
select {
case <-injectAbort:
log.Info("[Chaos]: Abort received, skipping chaos injection")
default:
err = cmd.Run()
if err != nil {
log.Fatalf("dns interceptor failed : %v", err)
var errList []string
for _, t := range targets {
if err := t.Cmd.Wait(); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
log.Errorf("err: %v", strings.Join(errList, ", "))
done <- fmt.Errorf("err: %v", strings.Join(errList, ", "))
}
done <- nil
}()
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", experimentsDetails.TargetPods); err != nil {
if revertErr := terminateProcess(cmd); revertErr != nil {
return fmt.Errorf("failed to revert and annotate the result, err: %v", fmt.Sprintf("%s, %s", err.Error(), revertErr.Error()))
}
return err
}
// check the timeout for the command
// Note: timeout will occur when process didn't complete even after 10s of chaos duration
timeout := time.After((time.Duration(experimentsDetails.ChaosDuration) + 30) * time.Second)
timeChan := time.Tick(time.Duration(experimentsDetails.ChaosDuration) * time.Second)
log.Infof("[Chaos]: Waiting for %vs", experimentsDetails.ChaosDuration)
// either wait for abort signal or chaos duration
select {
case <-abort:
log.Info("[Chaos]: Killing process started because of terminated signal received")
case <-timeChan:
log.Info("[Chaos]: Stopping the experiment, chaos duration over")
case <-timeout:
// the stress process gets timeout before completion
log.Infof("[Chaos] The stress process is not yet completed after the chaos duration of %vs", experimentsDetails.ChaosDuration+30)
log.Info("[Timeout]: Killing the stress process")
var errList []string
for _, t := range targets {
if err = terminateProcess(t); err != nil {
errList = append(errList, err.Error())
continue
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
case doneErr := <-done:
select {
case <-injectAbort:
// wait for the completion of abort handler
time.Sleep(10 * time.Second)
default:
log.Info("[Info]: Reverting Chaos")
var errList []string
for _, t := range targets {
if err := terminateProcess(t); err != nil {
errList = append(errList, err.Error())
continue
}
if err := result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return doneErr
}
}
log.Info("Chaos Revert Started")
// retry thrice for the chaos revert
return nil
}
func injectChaos(experimentsDetails *experimentTypes.ExperimentDetails, t targetDetails) (*exec.Cmd, error) {
// prepare dns interceptor
var out bytes.Buffer
commandTemplate := fmt.Sprintf("sudo TARGET_PID=%d CHAOS_TYPE=%s SPOOF_MAP='%s' TARGET_HOSTNAMES='%s' CHAOS_DURATION=%d MATCH_SCHEME=%s nsutil -p -n -t %d -- dns_interceptor", t.Pid, experimentsDetails.ChaosType, experimentsDetails.SpoofMap, experimentsDetails.TargetHostNames, experimentsDetails.ChaosDuration, experimentsDetails.MatchScheme, t.Pid)
cmd := exec.Command("/bin/bash", "-c", commandTemplate)
log.Info(cmd.String())
cmd.Stdout = &out
cmd.Stderr = &out
if err = cmd.Start(); err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: experimentsDetails.ChaosPodName, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("faild to inject chaos: %s", out.String())}
}
return cmd, nil
}
func terminateProcess(t targetDetails) error {
// kill command
killTemplate := fmt.Sprintf("sudo kill %d", t.Cmd.Process.Pid)
kill := exec.Command("/bin/bash", "-c", killTemplate)
var out bytes.Buffer
kill.Stderr = &out
kill.Stdout = &out
if err = kill.Run(); err != nil {
if strings.Contains(strings.ToLower(out.String()), ProcessAlreadyKilled) {
return nil
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: fmt.Sprintf("failed to revert chaos %s", out.String())}
} else {
log.Errorf("dns interceptor process stopped")
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
}
return nil
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targets []targetDetails, resultName, chaosNS string) {
<-abort
log.Info("[Chaos]: Killing process started because of terminated signal received")
log.Info("[Abort]: Chaos Revert Started")
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
if cmd.Process == nil {
log.Infof("cannot kill dns interceptor, process not started. Retrying in 1sec...")
} else {
log.Infof("killing dns interceptor with pid %v", cmd.Process.Pid)
if err := terminateProcess(cmd); err != nil {
return err
for _, t := range targets {
if err = terminateProcess(t); err != nil {
log.Errorf("unable to revert for %v pod, err :%v", t.Name, err)
continue
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("unable to annotate the chaosresult for %v pod, err :%v", t.Name, err)
}
}
retry--
time.Sleep(1 * time.Second)
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
log.Info("Chaos Revert Completed")
return nil
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}
func terminateProcess(cmd *exec.Cmd) error {
// kill command
killTemplate := fmt.Sprintf("sudo kill %d", cmd.Process.Pid)
kill := exec.Command("/bin/bash", "-c", killTemplate)
var stderr bytes.Buffer
kill.Stderr = &stderr
if err := kill.Run(); err != nil {
if strings.Contains(strings.ToLower(stderr.String()), ProcessAlreadyKilled) {
return nil
}
log.Errorf("unable to kill dns interceptor process %v, err :%v", emoji.Sprint(":cry:"), err)
} else {
log.Errorf("dns interceptor process stopped")
}
return nil
}
//getENV fetches all the env variables from the runner pod
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "60"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
@ -186,3 +285,14 @@ func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosType = types.Getenv("CHAOS_TYPE", "error")
experimentDetails.SocketPath = types.Getenv("SOCKET_PATH", "")
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
ContainerId string
Pid int
CommandPid int
Cmd *exec.Cmd
Source string
}

View File

@ -2,33 +2,40 @@ package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-dns-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareAndInjectChaos contains the preparation & injection steps
func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodDNSFault")
defer span.End()
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
@ -47,41 +54,41 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err)
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "")
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode inject the DNS Chaos in all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDNSFaultInSerialMode")
defer span.End()
labelSuffix := common.GetRunID()
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -91,10 +98,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
@ -102,33 +106,15 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for pod-dns chaos
log.Info("[Cleanup]: Deleting the the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("Unable to delete the helper pods, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
@ -136,78 +122,53 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode inject the DNS Chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDNSFaultInParallelMode")
defer span.End()
labelSuffix := common.GetRunID()
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform DNS Chaos
for _, pod := range targetPodList.Items {
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for pod-dns chaos
log.Info("[Cleanup]: Deleting all the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("Unable to delete the helper pods, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, podName, nodeName, runID, labelSuffix string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreatePodDNSFaultHelperPod")
defer span.End()
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
@ -240,7 +201,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name dns-chaos",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, podName),
Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "cri-socket",
@ -255,18 +216,23 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName string) []apiv1.EnvVar {
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS).
SetEnv("APP_POD", podName).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
@ -279,6 +245,8 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName st
SetEnv("MATCH_SCHEME", experimentsDetails.MatchScheme).
SetEnv("CHAOS_TYPE", experimentsDetails.ChaosType).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV

View File

@ -1,6 +1,7 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
@ -8,7 +9,13 @@ import (
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-fio-stress/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -16,15 +23,36 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
)
// PrepareChaos contains the chaos preparation and injection steps
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodFIOStressFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Fio stress experiment
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not inject chaos")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// stressStorage uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the storage utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
func stressStorage(experimentDetails *experimentTypes.ExperimentDetails, podName string, clients clients.ClientSets, stressErr chan error) {
func stressStorage(experimentDetails *experimentTypes.ExperimentDetails, podName, ns string, clients clients.ClientSets, stressErr chan error) {
log.Infof("The storage consumption is: %vM", experimentDetails.Size)
@ -37,23 +65,24 @@ func stressStorage(experimentDetails *experimentTypes.ExperimentDetails, podName
log.Infof("Running the command:\n%v", fioCmd)
command := []string{"/bin/sh", "-c", fioCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentDetails.TargetContainer, experimentDetails.AppNS)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentDetails.TargetContainer, ns)
_, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err
}
//experimentExecution function orchestrates the experiment by calling the StressStorage function, of every container, of every pod that is targeted
func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// experimentExecution function orchestrates the experiment by calling the StressStorage function, of every container, of every pod that is targeted
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide either of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
@ -62,31 +91,33 @@ func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails,
}
log.Infof("Target pods list for chaos, %v", podNames)
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "")
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode stressed the storage of all target application in serial mode (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodFIOStressFaultInSerialMode")
defer span.End()
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -103,10 +134,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
@ -114,7 +142,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"Target Pod": pod.Name,
"Space Consumption(MB)": experimentsDetails.Size,
})
go stressStorage(experimentsDetails, pod.Name, clients, stressErr)
go stressStorage(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
@ -130,19 +158,25 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if stor to be stressed exceed than the resource limit for the target container
// oom kill occurs if resource to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed")
return nil
}
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("podName: %s, namespace: %s, container: %s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: fmt.Sprintf("failed to stress cpu of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressSerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil {
if err := killStressSerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
@ -151,21 +185,23 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
break loop
}
}
if err := killStressSerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil {
return err
if err := killStressSerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
}
return nil
}
// injectChaosInParallelMode stressed the storage of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodFIOStressFaultInParallelMode")
defer span.End()
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -182,10 +218,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
}
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
@ -193,7 +226,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"Target Pod": pod.Name,
"Storage Consumption(MB)": experimentsDetails.Size,
})
go stressStorage(experimentsDetails, pod.Name, clients, stressErr)
go stressStorage(experimentsDetails, pod.Name, pod.Namespace, clients, stressErr)
}
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
@ -209,19 +242,25 @@ loop:
case err := <-stressErr:
// skipping the execution, if received any error other than 137, while executing stress command and marked result as fail
// it will ignore the error code 137(oom kill), it will skip further execution and marked the result as pass
// oom kill occurs if stor to be stressed exceed than the resource limit for the target container
// oom kill occurs if resource to be stressed exceed than the resource limit for the target container
if err != nil {
if strings.Contains(err.Error(), "137") {
log.Warn("Chaos process OOM killed")
return nil
}
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to injcet chaos: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil {
if err := killStressParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
@ -229,58 +268,41 @@ loop:
break loop
}
}
if err := killStressParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients); err != nil {
return err
if err := killStressParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients); err != nil {
return stacktrace.Propagate(err, "could revert chaos")
}
return nil
}
//PrepareChaos contains the chaos prepration and injection steps
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Fio stress experiment
if err := experimentExecution(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// killStressSerial function to kill a stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
//
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressSerial(containerName, podName, namespace, KillCmd string, clients clients.ClientSets) error {
// It will contains all the pod & container details required for exec command
// It will contain all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
command := []string{"/bin/sh", "-c", KillCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
out, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return errors.Errorf("Unable to kill stress process inside target container, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, namespace), Reason: fmt.Sprintf("failed to revert chaos: %s", out)}
}
return nil
}
// killStressParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressParallel(containerName string, targetPodList corev1.PodList, namespace, KillCmd string, clients clients.ClientSets) error {
func killStressParallel(containerName string, targetPodList corev1.PodList, KillCmd string, clients clients.ClientSets) error {
var errList []string
for _, pod := range targetPodList.Items {
if err := killStressSerial(containerName, pod.Name, namespace, KillCmd, clients); err != nil {
return err
if err := killStressSerial(containerName, pod.Name, pod.Namespace, KillCmd, clients); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}

View File

@ -1,6 +1,7 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
@ -9,7 +10,12 @@ import (
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-memory-hog-exec/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -18,13 +24,39 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
)
var inject chan os.Signal
// PrepareMemoryExecStress contains the chaos preparation and injection steps
func PrepareMemoryExecStress(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodMemoryHogExecFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Memory stress experiment
if err := experimentMemory(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not stress memory")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// stressMemory Uses the REST API to exec into the target container of the target pod
// The function will be constantly increasing the Memory utilisation until it reaches the maximum available or allowed number.
// Using the TOTAL_CHAOS_DURATION we will need to specify for how long this experiment will last
@ -39,22 +71,23 @@ func stressMemory(MemoryConsumption, containerName, podName, namespace string, c
command := []string{"/bin/sh", "-c", ddCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
_, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
stressErr <- err
}
//experimentMemory function orchestrates the experiment by calling the StressMemory function, of every container, of every pod that is targeted
func experimentMemory(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// experimentMemory function orchestrates the experiment by calling the StressMemory function, of every container, of every pod that is targeted
func experimentMemory(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
@ -63,30 +96,31 @@ func experimentMemory(experimentsDetails *experimentTypes.ExperimentDetails, cli
}
log.Infof("Target pods list for chaos, %v", podNames)
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "")
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode stressed the memory of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodMemoryHogExecFaultInSerialMode")
defer span.End()
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -118,10 +152,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
@ -129,7 +160,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"Target Pod": pod.Name,
"Memory Consumption(MB)": experimentsDetails.MemoryConsumption,
})
go stressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, clients, stressErr)
go stressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, pod.Namespace, clients, stressErr)
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
@ -148,17 +179,20 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
log.Warn("Chaos process OOM killed")
return nil
}
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("podName: %s, namespace: %s, container: %s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: fmt.Sprintf("failed to stress memory of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
if err := killStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
@ -167,8 +201,8 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
break loop
}
}
if err := killStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
return err
if err := killStressMemorySerial(experimentsDetails.TargetContainer, pod.Name, pod.Namespace, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not revert memory stress")
}
}
}
@ -176,13 +210,15 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode stressed the memory of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodMemoryHogExecFaultInParallelMode")
defer span.End()
// creating err channel to receive the error from the go routine
stressErr := make(chan error)
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -212,10 +248,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Get the target container name of the application pod
//It checks the empty target container for the first iteration only
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Chaos]: The Target application details", logrus.Fields{
@ -224,7 +257,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
"Memory Consumption(MB)": experimentsDetails.MemoryConsumption,
})
go stressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, experimentsDetails.AppNS, clients, stressErr)
go stressMemory(strconv.Itoa(experimentsDetails.MemoryConsumption), experimentsDetails.TargetContainer, pod.Name, pod.Namespace, clients, stressErr)
}
}
@ -243,13 +276,20 @@ loop:
log.Warn("Chaos process OOM killed")
return nil
}
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to stress memory of target pod: %s", err.Error())}
}
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := killStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
if err := killStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients, chaosDetails); err != nil {
log.Errorf("Error in Kill stress after abortion, err: %v", err)
}
// updating the chaosresult after stopped
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
if err := result.ChaosResult(chaosDetails, clients, resultDetails, "EOT"); err != nil {
log.Errorf("failed to update chaos result %s", err.Error())
}
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
case <-endTime:
@ -257,36 +297,12 @@ loop:
break loop
}
}
return killStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.AppNS, experimentsDetails.ChaosKillCmd, clients, chaosDetails)
}
//PrepareMemoryExecStress contains the chaos prepration and injection steps
func PrepareMemoryExecStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Memory stress experiment
if err := experimentMemory(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
return killStressMemoryParallel(experimentsDetails.TargetContainer, targetPodList, experimentsDetails.ChaosKillCmd, clients, chaosDetails)
}
// killStressMemorySerial function to kill a stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
//
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressMemorySerial(containerName, podName, namespace, memFreeCmd string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
// It will contains all the pod & container details required for exec command
execCommandDetails := litmusexec.PodDetails{}
@ -294,9 +310,9 @@ func killStressMemorySerial(containerName, podName, namespace, memFreeCmd string
command := []string{"/bin/sh", "-c", memFreeCmd}
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, containerName, namespace)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
out, _, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return errors.Errorf("Unable to kill stress process inside target container, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, namespace), Reason: fmt.Sprintf("failed to revert chaos: %s", out)}
}
common.SetTargets(podName, "reverted", "pod", chaosDetails)
return nil
@ -304,13 +320,15 @@ func killStressMemorySerial(containerName, podName, namespace, memFreeCmd string
// killStressMemoryParallel function to kill all the stress process running inside target container
// Triggered by either timeout of chaos duration or termination of the experiment
func killStressMemoryParallel(containerName string, targetPodList corev1.PodList, namespace, memFreeCmd string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
func killStressMemoryParallel(containerName string, targetPodList corev1.PodList, memFreeCmd string, clients clients.ClientSets, chaosDetails *types.ChaosDetails) error {
var errList []string
for _, pod := range targetPodList.Items {
if err := killStressMemorySerial(containerName, pod.Name, namespace, memFreeCmd, clients, chaosDetails); err != nil {
return err
if err := killStressMemorySerial(containerName, pod.Name, pod.Namespace, memFreeCmd, clients, chaosDetails); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}

View File

@ -1,12 +1,14 @@
package lib
import (
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/palantir/stacktrace"
"strings"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-network-partition/types"
"github.com/pkg/errors"
"gopkg.in/yaml.v2"
corev1 "k8s.io/api/core/v1"
networkv1 "k8s.io/api/networking/v1"
@ -52,12 +54,12 @@ func (np *NetworkPolicy) getNetworkPolicyDetails(experimentsDetails *experimentT
// sets the ports for the traffic control
if err := np.setPort(experimentsDetails.PORTS); err != nil {
return err
return stacktrace.Propagate(err, "could not set port")
}
// sets the destination ips for which the traffic should be blocked
if err := np.setExceptIPs(experimentsDetails); err != nil {
return err
return stacktrace.Propagate(err, "could not set ips")
}
// sets the egress traffic rules
@ -138,11 +140,11 @@ func (np *NetworkPolicy) setNamespaceSelector(nsLabel string) *NetworkPolicy {
// setPort sets all the protocols and ports
func (np *NetworkPolicy) setPort(p string) error {
ports := []networkv1.NetworkPolicyPort{}
var ports []networkv1.NetworkPolicyPort
var port Port
// unmarshal the protocols and ports from the env
if err := yaml.Unmarshal([]byte(strings.TrimSpace(parseCommand(p))), &port); err != nil {
return errors.Errorf("Unable to unmarshal, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("failed to unmarshal ports: %s", err.Error())}
}
// sets all the tcp ports
@ -182,7 +184,7 @@ func (np *NetworkPolicy) setExceptIPs(experimentsDetails *experimentTypes.Experi
// get all the target ips
destinationIPs, err := network_chaos.GetTargetIps(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, clients.ClientSets{}, false)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get destination ips")
}
ips := strings.Split(destinationIPs, ",")

View File

@ -2,11 +2,18 @@ package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-network-partition/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -15,7 +22,7 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
networkv1 "k8s.io/api/networking/v1"
@ -26,8 +33,10 @@ var (
inject, abort chan os.Signal
)
//PrepareAndInjectChaos contains the prepration & injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareAndInjectChaos contains the prepration & injection steps
func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkPartitionFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -40,13 +49,14 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// validate the appLabels
if chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide the appLabel")
if chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide the appLabel"}
}
// Get the target pod details for the chaos execution
targetPodList, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).List(context.Background(), v1.ListOptions{LabelSelector: experimentsDetails.AppLabel})
targetPodList, err := common.GetPodList("", 100, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
@ -56,7 +66,7 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
log.Infof("Target pods list for chaos, %v", podNames)
// generate a unique string
runID := common.GetRunID()
runID := stringutils.GetRunID()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -67,7 +77,7 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
// collect all the data for the network policy
np := initialize()
if err := np.getNetworkPolicyDetails(experimentsDetails); err != nil {
return err
return stacktrace.Propagate(err, "could not get network policy details")
}
//DISPLAY THE NETWORK POLICY DETAILS
@ -81,11 +91,11 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
})
// watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, clients, chaosDetails, resultDetails, targetPodList, runID)
go abortWatcher(experimentsDetails, clients, chaosDetails, resultDetails, &targetPodList, runID)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -96,8 +106,8 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
os.Exit(0)
default:
// creating the network policy to block the traffic
if err := createNetworkPolicy(experimentsDetails, clients, np, runID); err != nil {
return err
if err := createNetworkPolicy(ctx, experimentsDetails, clients, np, runID); err != nil {
return stacktrace.Propagate(err, "could not create network policy")
}
// updating chaos status to injected for the target pods
for _, pod := range targetPodList.Items {
@ -106,16 +116,16 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
}
// verify the presence of network policy inside cluster
if err := checkExistanceOfPolicy(experimentsDetails, clients, experimentsDetails.Timeout, experimentsDetails.Delay, runID); err != nil {
return err
if err := checkExistenceOfPolicy(experimentsDetails, clients, experimentsDetails.Timeout, experimentsDetails.Delay, runID); err != nil {
return stacktrace.Propagate(err, "could not check existence of network policy")
}
log.Infof("[Wait]: Wait for %v chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
// deleting the network policy after chaos duration over
if err := deleteNetworkPolicy(experimentsDetails, clients, targetPodList, chaosDetails, experimentsDetails.Timeout, experimentsDetails.Delay, runID); err != nil {
return err
if err := deleteNetworkPolicy(experimentsDetails, clients, &targetPodList, chaosDetails, experimentsDetails.Timeout, experimentsDetails.Delay, runID); err != nil {
return stacktrace.Propagate(err, "could not delete network policy")
}
// updating chaos status to reverted for the target pods
@ -134,7 +144,9 @@ func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails
// createNetworkPolicy creates the network policy in the application namespace
// it blocks ingress/egress traffic for the targeted application for specific/all IPs
func createNetworkPolicy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, networkPolicy *NetworkPolicy, runID string) error {
func createNetworkPolicy(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, networkPolicy *NetworkPolicy, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodNetworkPartitionFault")
defer span.End()
np := &networkv1.NetworkPolicy{
ObjectMeta: v1.ObjectMeta{
@ -157,7 +169,10 @@ func createNetworkPolicy(experimentsDetails *experimentTypes.ExperimentDetails,
}
_, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).Create(context.Background(), np, v1.CreateOptions{})
return err
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Reason: fmt.Sprintf("failed to create network policy: %s", err.Error())}
}
return nil
}
// deleteNetworkPolicy deletes the network policy and wait until the network policy deleted completely
@ -165,7 +180,7 @@ func deleteNetworkPolicy(experimentsDetails *experimentTypes.ExperimentDetails,
name := experimentsDetails.ExperimentName + "-np-" + runID
labels := "name=" + experimentsDetails.ExperimentName + "-np-" + runID
if err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).Delete(context.Background(), name, v1.DeleteOptions{}); err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{name: %s, namespace: %s}", name, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to delete network policy: %s", err.Error())}
}
err := retry.
@ -173,8 +188,10 @@ func deleteNetworkPolicy(experimentsDetails *experimentTypes.ExperimentDetails,
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
npList, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).List(context.Background(), v1.ListOptions{LabelSelector: labels})
if err != nil || len(npList.Items) != 0 {
return errors.Errorf("Unable to delete the network policy, err: %v", err)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to list network policies: %s", err.Error())}
} else if len(npList.Items) != 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: "network policies are not deleted within timeout"}
}
return nil
})
@ -189,8 +206,8 @@ func deleteNetworkPolicy(experimentsDetails *experimentTypes.ExperimentDetails,
return nil
}
// checkExistanceOfPolicy validate the presence of network policy inside the application namespace
func checkExistanceOfPolicy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, timeout, delay int, runID string) error {
// checkExistenceOfPolicy validate the presence of network policy inside the application namespace
func checkExistenceOfPolicy(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, timeout, delay int, runID string) error {
labels := "name=" + experimentsDetails.ExperimentName + "-np-" + runID
return retry.
@ -198,8 +215,10 @@ func checkExistanceOfPolicy(experimentsDetails *experimentTypes.ExperimentDetail
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
npList, err := clients.KubeClient.NetworkingV1().NetworkPolicies(experimentsDetails.AppNS).List(context.Background(), v1.ListOptions{LabelSelector: labels})
if err != nil || len(npList.Items) == 0 {
return errors.Errorf("no network policy found, err: %v", err)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to list network policies: %s", err.Error())}
} else if len(npList.Items) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{labels: %s, namespace: %s}", labels, experimentsDetails.AppNS), Reason: "no network policy found with matching labels"}
}
return nil
})
@ -215,8 +234,13 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
if err := checkExistanceOfPolicy(experimentsDetails, clients, 2, 1, runID); err != nil {
log.Infof("no active network policy found, err: %v", err)
if err := checkExistenceOfPolicy(experimentsDetails, clients, 2, 1, runID); err != nil {
if error, ok := err.(cerrors.Error); ok {
if strings.Contains(error.Reason, "no network policy found with matching labels") {
break
}
}
log.Infof("no active network policy found, err: %v", err.Error())
retry--
continue
}
@ -224,10 +248,12 @@ func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, clients
if err := deleteNetworkPolicy(experimentsDetails, clients, targetPodList, chaosDetails, 2, 1, runID); err != nil {
log.Errorf("unable to delete network policy, err: %v", err)
}
retry--
}
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
err := cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Reason: "experiment is aborted"}
failStep, errCode := cerrors.GetRootCauseAndErrorCode(err, string(chaosDetails.Phase))
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, errCode)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
log.Info("Chaos Revert Completed")
os.Exit(0)

View File

@ -0,0 +1,260 @@
package lib
import (
"fmt"
"go.opentelemetry.io/otel"
"golang.org/x/net/context"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/rds"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/rds-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
)
var (
err error
inject, abort chan os.Signal
)
func PrepareRDSInstanceStop(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareRDSInstanceStop")
defer span.End()
// Inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// Abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Get the instance identifier or list of instance identifiers
instanceIdentifierList := strings.Split(experimentsDetails.RDSInstanceIdentifier, ",")
if experimentsDetails.RDSInstanceIdentifier == "" || len(instanceIdentifierList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no RDS instance identifier found to stop"}
}
instanceIdentifierList = common.FilterBasedOnPercentage(experimentsDetails.InstanceAffectedPerc, instanceIdentifierList)
log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceIdentifierList))
// Watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, instanceIdentifierList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceIdentifierList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceIdentifierList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the rds instance state in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
select {
case <-inject:
// Stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instance identifier list, %v", instanceIdentifierList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on rds instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i, identifier := range instanceIdentifierList {
// Stopping the RDS instance
log.Info("[Chaos]: Stopping the desired RDS instance")
if err := awslib.RDSInstanceStop(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "injected", "RDS", chaosDetails)
// Wait for rds instance to completely stop
log.Infof("[Wait]: Wait for RDS instance '%v' to get in stopped state", identifier)
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
// Run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
// Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
// Starting the RDS instance
log.Info("[Chaos]: Starting back the RDS instance")
if err = awslib.RDSInstanceStart(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
// Wait for rds instance to get in available state
log.Infof("[Wait]: Wait for RDS instance '%v' to get in available state", identifier)
if err := awslib.WaitForRDSInstanceUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode will inject the rds instance termination in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instance identifier list, %v", instanceIdentifierList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on rds instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// PowerOff the instance
for _, identifier := range instanceIdentifierList {
// Stopping the RDS instance
log.Info("[Chaos]: Stopping the desired RDS instance")
if err := awslib.RDSInstanceStop(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "injected", "RDS", chaosDetails)
}
for _, identifier := range instanceIdentifierList {
// Wait for rds instance to completely stop
log.Infof("[Wait]: Wait for RDS instance '%v' to get in stopped state", identifier)
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
// Run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
// Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
// Starting the RDS instance
for _, identifier := range instanceIdentifierList {
log.Info("[Chaos]: Starting back the RDS instance")
if err = awslib.RDSInstanceStart(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
}
for _, identifier := range instanceIdentifierList {
// Wait for rds instance to get in available state
log.Infof("[Wait]: Wait for RDS instance '%v' to get in available state", identifier)
if err := awslib.WaitForRDSInstanceUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
}
for _, identifier := range instanceIdentifierList {
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// watching for the abort signal and revert the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for _, identifier := range instanceIdentifierList {
instanceState, err := awslib.GetRDSInstanceStatus(identifier, experimentsDetails.Region)
if err != nil {
log.Errorf("Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "running" {
log.Info("[Abort]: Waiting for the RDS instance to get down")
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
log.Errorf("Unable to wait till stop of the instance: %v", err)
}
log.Info("[Abort]: Starting RDS instance as abort signal received")
err := awslib.RDSInstanceStart(identifier, experimentsDetails.Region)
if err != nil {
log.Errorf("RDS instance failed to start when an abort signal is received: %v", err)
}
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -1,31 +1,38 @@
package lib
import (
"context"
"fmt"
"time"
redfishLib "github.com/litmuschaos/litmus-go/pkg/baremetal/redfish"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/baremetal/redfish-node-restart/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
//injectChaos initiates node restart chaos on the target node
func injectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
// injectChaos initiates node restart chaos on the target node
func injectChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectRedfishNodeRestartFault")
defer span.End()
URL := fmt.Sprintf("https://%v/redfish/v1/Systems/System.Embedded.1/Actions/ComputerSystem.Reset", experimentsDetails.IPMIIP)
return redfishLib.RebootNode(URL, experimentsDetails.User, experimentsDetails.Password)
}
//experimentExecution function orchestrates the experiment by calling the injectChaos function
func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// experimentExecution function orchestrates the experiment by calling the injectChaos function
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -36,17 +43,19 @@ func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails,
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if err := injectChaos(experimentsDetails, clients); err != nil {
return err
if err := injectChaos(ctx, experimentsDetails, clients); err != nil {
return stacktrace.Propagate(err, "chaos injection failed")
}
log.Infof("[Chaos]:Waiting for: %vs", experimentsDetails.ChaosDuration)
log.Infof("[Chaos]: Waiting for: %vs", experimentsDetails.ChaosDuration)
time.Sleep(time.Duration(experimentsDetails.ChaosDuration) * time.Second)
return nil
}
//PrepareChaos contains the chaos prepration and injection steps
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// PrepareChaos contains the chaos prepration and injection steps
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareRedfishNodeRestartFault")
defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -54,7 +63,7 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the Redfish node restart experiment
if err := experimentExecution(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
}
common.SetTargets(experimentsDetails.IPMIIP, "targeted", "node", chaosDetails)

View File

@ -2,9 +2,9 @@ package lib
import (
"bytes"
"context"
"encoding/json"
"fmt"
corev1 "k8s.io/api/core/v1"
"net/http"
"os"
"os/signal"
@ -12,6 +12,12 @@ import (
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
corev1 "k8s.io/api/core/v1"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -20,7 +26,6 @@ import (
experimentTypes "github.com/litmuschaos/litmus-go/pkg/spring-boot/spring-boot-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
)
@ -38,8 +43,8 @@ func SetTargetPodList(experimentsDetails *experimentTypes.ExperimentDetails, cli
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
var err error
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "please provide one of the appLabel or TARGET_PODS"}
}
if experimentsDetails.TargetPodList, err = common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails); err != nil {
return err
@ -49,7 +54,10 @@ func SetTargetPodList(experimentsDetails *experimentTypes.ExperimentDetails, cli
}
// PrepareChaos contains the preparation steps before chaos injection
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareSpringBootFault")
defer span.End()
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
@ -64,25 +72,18 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
"Controller": experimentsDetails.ChaosMonkeyWatchers.Controller,
"RestController": experimentsDetails.ChaosMonkeyWatchers.RestController,
})
log.InfoWithValues("[Info]: Chaos monkeys assaults will be injected to the target pods as follows", logrus.Fields{
"CPU Assault": experimentsDetails.ChaosMonkeyAssault.CPUActive,
"Memory Assault": experimentsDetails.ChaosMonkeyAssault.MemoryActive,
"Kill App Assault": experimentsDetails.ChaosMonkeyAssault.KillApplicationActive,
"Latency Assault": experimentsDetails.ChaosMonkeyAssault.LatencyActive,
"Exception Assault": experimentsDetails.ChaosMonkeyAssault.ExceptionsActive,
})
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return err
if err := injectChaosInSerialMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return err
if err := injectChaosInParallelMode(ctx, experimentsDetails, clients, chaosDetails, eventsDetails, resultDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// Waiting for the ramp time after chaos injection
@ -98,25 +99,30 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
func CheckChaosMonkey(chaosMonkeyPort string, chaosMonkeyPath string, targetPods corev1.PodList) (bool, error) {
hasErrors := false
targetPodNames := []string{}
for _, pod := range targetPods.Items {
targetPodNames = append(targetPodNames, pod.Name)
endpoint := "http://" + pod.Status.PodIP + ":" + chaosMonkeyPort + chaosMonkeyPath
log.Infof("[Check]: Checking pod: %v (endpoint: %v)", pod.Name, endpoint)
resp, err := http.Get(endpoint)
if err != nil {
log.Errorf("failed to request chaos monkey endpoint on pod %v (err: %v)", pod.Name, resp.StatusCode)
log.Errorf("failed to request chaos monkey endpoint on pod %s, %s", pod.Name, err.Error())
hasErrors = true
continue
}
if resp.StatusCode != 200 {
log.Errorf("failed to get chaos monkey endpoint on pod %v (status: %v)", pod.Name, resp.StatusCode)
log.Errorf("failed to get chaos monkey endpoint on pod %s (status: %d)", pod.Name, resp.StatusCode)
hasErrors = true
}
}
if hasErrors {
return false, errors.Errorf("failed to check chaos moonkey on at least one pod, check logs for details")
return false, cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{podNames: %s}", targetPodNames), Reason: "failed to check chaos monkey on at least one pod, check logs for details"}
}
return true, nil
}
@ -130,7 +136,7 @@ func enableChaosMonkey(chaosMonkeyPort string, chaosMonkeyPath string, pod corev
}
if resp.StatusCode != 200 {
return errors.Errorf("failed to enable chaos monkey endpoint on pod %v (status: %v)", pod.Name, resp.StatusCode)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to enable chaos monkey endpoint (status: %d)", resp.StatusCode)}
}
return nil
@ -141,37 +147,33 @@ func setChaosMonkeyWatchers(chaosMonkeyPort string, chaosMonkeyPath string, watc
jsonValue, err := json.Marshal(watchers)
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to marshal chaos monkey watchers, %s", err.Error())}
}
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/watchers", "application/json", bytes.NewBuffer(jsonValue))
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to set watchers, %s", err.Error())}
}
if resp.StatusCode != 200 {
return errors.Errorf("failed to set assault on pod %v (status: %v)", pod.Name, resp.StatusCode)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to set assault (status: %d)", resp.StatusCode)}
}
return nil
}
func startAssault(chaosMonkeyPort string, chaosMonkeyPath string, assault experimentTypes.ChaosMonkeyAssault, pod corev1.Pod) error {
jsonValue, err := json.Marshal(assault)
if err != nil {
return err
}
if err := setChaosMonkeyAssault(chaosMonkeyPort, chaosMonkeyPath, jsonValue, pod); err != nil {
func startAssault(chaosMonkeyPort string, chaosMonkeyPath string, assault []byte, pod corev1.Pod) error {
if err := setChaosMonkeyAssault(chaosMonkeyPort, chaosMonkeyPath, assault, pod); err != nil {
return err
}
log.Infof("[Chaos]: Activating Chaos Monkey assault on pod: %v", pod.Name)
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/assaults/runtime/attack", "", nil)
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to start assault %s", err.Error())}
}
if resp.StatusCode != 200 {
return errors.Errorf("failed to activate runtime attack on pod %v (status: %v)", pod.Name, resp.StatusCode)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to activate runtime attack (status: %d)", resp.StatusCode)}
}
return nil
}
@ -181,45 +183,47 @@ func setChaosMonkeyAssault(chaosMonkeyPort string, chaosMonkeyPath string, assau
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/assaults", "application/json", bytes.NewBuffer(assault))
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to set assault, %s", err.Error())}
}
if resp.StatusCode != 200 {
return errors.Errorf("failed to set assault on pod %v (status: %v)", pod.Name, resp.StatusCode)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to set assault (status: %d)", resp.StatusCode)}
}
return nil
}
// disableChaosMonkey disables chaos monkey on selected pods
func disableChaosMonkey(chaosMonkeyPort string, chaosMonkeyPath string, pod corev1.Pod) error {
log.Infof("[Chaos]: disabling assaults on pod %v", pod.Name)
func disableChaosMonkey(ctx context.Context, chaosMonkeyPort string, chaosMonkeyPath string, pod corev1.Pod) error {
log.Infof("[Chaos]: disabling assaults on pod %s", pod.Name)
jsonValue, err := json.Marshal(revertAssault)
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to marshal chaos monkey revert-chaos watchers, %s", err.Error())}
}
if err := setChaosMonkeyAssault(chaosMonkeyPort, chaosMonkeyPath, jsonValue, pod); err != nil {
return err
}
log.Infof("[Chaos]: disabling chaos monkey on pod %v", pod.Name)
log.Infof("[Chaos]: disabling chaos monkey on pod %s", pod.Name)
resp, err := http.Post("http://"+pod.Status.PodIP+":"+chaosMonkeyPort+chaosMonkeyPath+"/disable", "", nil)
if err != nil {
return err
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to call the chaos monkey api to disable assault, %s", err.Error())}
}
if resp.StatusCode != 200 {
return errors.Errorf("failed to disable chaos monkey endpoint on pod %v (status: %v)", pod.Name, resp.StatusCode)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", pod.Name, pod.Namespace), Reason: fmt.Sprintf("failed to disable chaos monkey endpoint (status: %d)", resp.StatusCode)}
}
return nil
}
// injectChaosInSerialMode injects chaos monkey assault on pods in serial mode(one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectSpringBootFaultInSerialMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -273,14 +277,14 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
select {
case <-signChan:
log.Info("[Chaos]: Revert Started")
if err := disableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("Error in disabling chaos monkey, err: %v", err)
} else {
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
}
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, cerrors.ErrorTypeExperimentAborted)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
@ -291,8 +295,8 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
}
if err := disableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
return fmt.Errorf("error in disabling chaos monkey, err: %v", err)
if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
return err
}
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
@ -303,11 +307,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode injects chaos monkey assault on pods in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, eventsDetails *types.EventDetails, resultDetails *types.ResultDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectSpringBootFaultInParallelMode")
defer span.End()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -338,16 +344,17 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
})
if err := setChaosMonkeyWatchers(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyWatchers, pod); err != nil {
return errors.Errorf("[Chaos]: Failed to set watchers, err: %v ", err)
log.Errorf("[Chaos]: Failed to set watchers, err: %v", err)
return err
}
if err := startAssault(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, experimentsDetails.ChaosMonkeyAssault, pod); err != nil {
log.Errorf("[Chaos]: Failed to set assault, err: %v ", err)
log.Errorf("[Chaos]: Failed to set assault, err: %v", err)
return err
}
if err := enableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("[Chaos]: Failed to enable chaos, err: %v ", err)
log.Errorf("[Chaos]: Failed to enable chaos, err: %v", err)
return err
}
common.SetTargets(pod.Name, "injected", "pod", chaosDetails)
@ -361,7 +368,7 @@ loop:
case <-signChan:
log.Info("[Chaos]: Revert Started")
for _, pod := range experimentsDetails.TargetPodList.Items {
if err := disableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
log.Errorf("Error in disabling chaos monkey, err: %v", err)
} else {
common.SetTargets(pod.Name, "reverted", "pod", chaosDetails)
@ -369,7 +376,7 @@ loop:
}
// updating the chaosresult after stopped
failStep := "Chaos injection stopped!"
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep)
types.SetResultAfterCompletion(resultDetails, "Stopped", "Stopped", failStep, cerrors.ErrorTypeExperimentAborted)
result.ChaosResult(chaosDetails, clients, resultDetails, "EOT")
log.Info("[Chaos]: Revert Completed")
os.Exit(1)
@ -382,7 +389,7 @@ loop:
var errorList []string
for _, pod := range experimentsDetails.TargetPodList.Items {
if err := disableChaosMonkey(experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
if err := disableChaosMonkey(ctx, experimentsDetails.ChaosMonkeyPort, experimentsDetails.ChaosMonkeyPath, pod); err != nil {
errorList = append(errorList, err.Error())
continue
}
@ -390,7 +397,7 @@ loop:
}
if len(errorList) != 0 {
return fmt.Errorf("error in disabling chaos monkey, err: %v", strings.Join(errorList, ", "))
return cerrors.PreserveError{ErrString: fmt.Sprintf("error in disabling chaos monkey, [%s]", strings.Join(errorList, ","))}
}
return nil
}

View File

@ -3,6 +3,7 @@ package helper
import (
"bufio"
"bytes"
"context"
"fmt"
"io"
"os"
@ -16,19 +17,26 @@ import (
"github.com/containerd/cgroups"
cgroupsv2 "github.com/containerd/cgroups/v2"
"github.com/palantir/stacktrace"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientTypes "k8s.io/apimachinery/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
clientTypes "k8s.io/apimachinery/pkg/types"
)
//list of cgroups in a container
// list of cgroups in a container
var (
cgroupSubsystemList = []string{"cpu", "memory", "systemd", "net_cls",
"net_prio", "freezer", "blkio", "perf_event", "devices", "cpuset",
@ -49,7 +57,9 @@ const (
)
// Helper injects the stress chaos
func Helper(clients clients.ClientSets) {
func Helper(ctx context.Context, clients clients.ClientSets) {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "SimulatePodStressFault")
defer span.End()
experimentsDetails := experimentTypes.ExperimentDetails{}
eventsDetails := types.EventDetails{}
@ -72,6 +82,7 @@ func Helper(clients clients.ClientSets) {
// Intialise the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
chaosDetails.Phase = types.ChaosInjectPhase
// Intialise Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
@ -80,145 +91,260 @@ func Helper(clients clients.ClientSets) {
result.SetResultUID(&resultDetails, clients, &chaosDetails)
if err := prepareStressChaos(&experimentsDetails, clients, &eventsDetails, &chaosDetails, &resultDetails); err != nil {
// update failstep inside chaosresult
if resultErr := result.UpdateFailedStepFromHelper(&resultDetails, &chaosDetails, clients, err); resultErr != nil {
log.Fatalf("helper pod failed, err: %v, resultErr: %v", err, resultErr)
}
log.Fatalf("helper pod failed, err: %v", err)
}
}
//prepareStressChaos contains the chaos preparation and injection steps
// prepareStressChaos contains the chaos preparation and injection steps
func prepareStressChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
// get stressors in list format
stressorList := prepareStressor(experimentsDetails)
if len(stressorList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: "fail to prepare stressors"}
}
stressors := strings.Join(stressorList, " ")
targetList, err := common.ParseTargets(chaosDetails.ChaosPodName)
if err != nil {
return stacktrace.Propagate(err, "could not parse targets")
}
var targets []*targetDetails
for _, t := range targetList.Target {
td := &targetDetails{
Name: t.Name,
Namespace: t.Namespace,
Source: chaosDetails.ChaosPodName,
}
td.TargetContainers, err = common.GetTargetContainers(t.Name, t.Namespace, t.TargetContainer, chaosDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get target containers")
}
td.ContainerIds, err = common.GetContainerIDs(td.Namespace, td.Name, td.TargetContainers, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container ids")
}
for _, cid := range td.ContainerIds {
// extract out the pid of the target container
pid, err := common.GetPID(experimentsDetails.ContainerRuntime, cid, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
}
td.Pids = append(td.Pids, pid)
}
for i := range td.Pids {
cGroupManagers, err, grpPath := getCGroupManager(td, i)
if err != nil {
return stacktrace.Propagate(err, "could not get cgroup manager")
}
td.GroupPath = grpPath
td.CGroupManagers = append(td.CGroupManagers, cGroupManagers)
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": td.Name,
"Namespace": td.Namespace,
"TargetContainers": td.TargetContainers,
})
targets = append(targets, td)
}
// watching for the abort signal and revert the chaos if an abort signal is received
go abortWatcher(targets, resultDetails.Name, chaosDetails.ChaosNamespace)
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(1)
default:
}
containerID, err := common.GetContainerID(experimentsDetails.AppNS, experimentsDetails.TargetPods, experimentsDetails.TargetContainer, clients)
if err != nil {
return err
}
// extract out the pid of the target container
targetPID, err := common.GetPID(experimentsDetails.ContainerRuntime, containerID, experimentsDetails.SocketPath)
if err != nil {
return err
}
done := make(chan error, 1)
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
cgroupManager, err := getCGroupManager(int(targetPID), containerID)
if err != nil {
return errors.Errorf("fail to get the cgroup manager, err: %v", err)
}
// get stressors in list format
stressorList := prepareStressor(experimentsDetails)
if len(stressorList) == 0 {
return errors.Errorf("fail to prepare stressor for %v experiment", experimentsDetails.ExperimentName)
}
stressors := strings.Join(stressorList, " ")
stressCommand := "pause nsutil -t " + strconv.Itoa(targetPID) + " -p -- " + stressors
log.Infof("[Info]: starting process: %v", stressCommand)
// launch the stress-ng process on the target container in paused mode
cmd := exec.Command("/bin/bash", "-c", stressCommand)
// enables the process group id
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
var buf bytes.Buffer
cmd.Stdout = &buf
err = cmd.Start()
if err != nil {
return errors.Errorf("fail to start the stress process %v, err: %v", stressCommand, err)
}
// watching for the abort signal and revert the chaos if an abort signal is received
go abortWatcher(cmd.Process.Pid, resultDetails.Name, chaosDetails.ChaosNamespace, experimentsDetails.TargetPods)
// add the stress process to the cgroup of target container
if err = addProcessToCgroup(cmd.Process.Pid, cgroupManager); err != nil {
if killErr := cmd.Process.Kill(); killErr != nil {
return errors.Errorf("stressors failed killing %v process, err: %v", cmd.Process.Pid, killErr)
}
return errors.Errorf("fail to add the stress process into target container cgroup, err: %v", err)
}
log.Info("[Info]: Sending signal to resume the stress process")
// wait for the process to start before sending the resume signal
// TODO: need a dynamic way to check the start of the process
time.Sleep(700 * time.Millisecond)
// remove pause and resume or start the stress process
if err := cmd.Process.Signal(syscall.SIGCONT); err != nil {
return errors.Errorf("fail to remove pause and start the stress process: %v", err)
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
log.Info("[Wait]: Waiting for chaos completion")
// channel to check the completion of the stress process
done := make(chan error)
go func() { done <- cmd.Wait() }()
// check the timeout for the command
// Note: timeout will occur when process didn't complete even after 10s of chaos duration
timeout := time.After((time.Duration(experimentsDetails.ChaosDuration) + 30) * time.Second)
select {
case <-timeout:
// the stress process gets timeout before completion
log.Infof("[Chaos] The stress process is not yet completed after the chaos duration of %vs", experimentsDetails.ChaosDuration+30)
log.Info("[Timeout]: Killing the stress process")
if err = terminateProcess(cmd.Process.Pid); err != nil {
return err
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
return nil
case err := <-done:
for index, t := range targets {
for i := range t.Pids {
cmd, err := injectChaos(t, stressors, i, experimentsDetails.StressType)
if err != nil {
err, ok := err.(*exec.ExitError)
if ok {
status := err.Sys().(syscall.WaitStatus)
if status.Signaled() && status.Signal() == syscall.SIGKILL {
// wait for the completion of abort handler
time.Sleep(10 * time.Second)
return errors.Errorf("process stopped with SIGKILL signal")
if revertErr := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, index-1); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not inject chaos")
}
targets[index].Cmds = append(targets[index].Cmds, cmd)
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainers[i])
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, index); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
}
}
// record the event inside chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
log.Info("[Wait]: Waiting for chaos completion")
// channel to check the completion of the stress process
go func() {
var errList []string
var exitErr error
for _, t := range targets {
for i := range t.Cmds {
if err := t.Cmds[i].Cmd.Wait(); err != nil {
log.Infof("stress process failed, err: %v, out: %v", err, t.Cmds[i].Buffer.String())
if _, ok := err.(*exec.ExitError); ok {
exitErr = err
continue
}
errList = append(errList, err.Error())
}
}
}
if exitErr != nil {
oomKilled, err := checkOOMKilled(targets, clients, exitErr)
if err != nil {
log.Infof("could not check oomkilled, err: %v", err)
}
if !oomKilled {
done <- exitErr
}
done <- nil
} else if len(errList) != 0 {
oomKilled, err := checkOOMKilled(targets, clients, fmt.Errorf("err: %v", strings.Join(errList, ", ")))
if err != nil {
log.Infof("could not check oomkilled, err: %v", err)
}
if !oomKilled {
done <- fmt.Errorf("err: %v", strings.Join(errList, ", "))
}
done <- nil
} else {
done <- nil
}
}()
// check the timeout for the command
// Note: timeout will occur when process didn't complete even after 10s of chaos duration
timeout := time.After((time.Duration(experimentsDetails.ChaosDuration) + 30) * time.Second)
select {
case <-timeout:
// the stress process gets timeout before completion
log.Infof("[Chaos] The stress process is not yet completed after the chaos duration of %vs", experimentsDetails.ChaosDuration+30)
log.Info("[Timeout]: Killing the stress process")
if err := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
case err := <-done:
if err != nil {
exitErr, ok := err.(*exec.ExitError)
if ok {
status := exitErr.Sys().(syscall.WaitStatus)
if status.Signaled() {
log.Infof("process stopped with signal: %v", status.Signal())
}
if status.Signaled() && status.Signal() == syscall.SIGKILL {
// wait for the completion of abort handler
time.Sleep(10 * time.Second)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeExperimentAborted, Source: chaosDetails.ChaosPodName, Reason: fmt.Sprintf("process stopped with SIGTERM signal")}
}
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: chaosDetails.ChaosPodName, Reason: err.Error()}
}
log.Info("[Info]: Reverting Chaos")
if err := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
}
return nil
}
func revertChaosForAllTargets(targets []*targetDetails, resultDetails *types.ResultDetails, chaosNs string, index int) error {
var errList []string
for i := 0; i <= index; i++ {
if err := terminateProcess(targets[i]); err != nil {
errList = append(errList, err.Error())
continue
}
if err := result.AnnotateChaosResult(resultDetails.Name, chaosNs, "reverted", "pod", targets[i].Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// checkOOMKilled checks if the container within the target pods failed due to an OOMKilled error.
func checkOOMKilled(targets []*targetDetails, clients clients.ClientSets, chaosError error) (bool, error) {
// Check each container in the pod
for i := 0; i < 3; i++ {
for _, t := range targets {
// Fetch the target pod
targetPod, err := clients.KubeClient.CoreV1().Pods(t.Namespace).Get(context.Background(), t.Name, v1.GetOptions{})
if err != nil {
return false, cerrors.Error{
ErrorCode: cerrors.ErrorTypeStatusChecks,
Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace),
Reason: err.Error(),
}
}
for _, c := range targetPod.Status.ContainerStatuses {
if utils.Contains(c.Name, t.TargetContainers) {
// Check for OOMKilled and restart
if c.LastTerminationState.Terminated != nil && c.LastTerminationState.Terminated.ExitCode == 137 {
log.Warnf("[Warning]: The target container '%s' of pod '%s' got OOM Killed, err: %v", c.Name, t.Name, chaosError)
return true, nil
}
}
return errors.Errorf("process exited before the actual cleanup, err: %v", err)
}
log.Info("[Info]: Chaos injection completed")
if err := terminateProcess(cmd.Process.Pid); err != nil {
return err
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", experimentsDetails.TargetPods); err != nil {
return err
}
}
time.Sleep(1 * time.Second)
}
return false, nil
}
// terminateProcess will remove the stress process from the target container after chaos completion
func terminateProcess(t *targetDetails) error {
var errList []string
for i := range t.Cmds {
if t.Cmds[i] != nil && t.Cmds[i].Cmd.Process != nil {
if err := syscall.Kill(-t.Cmds[i].Cmd.Process.Pid, syscall.SIGKILL); err != nil {
if strings.Contains(err.Error(), ProcessAlreadyKilled) || strings.Contains(err.Error(), ProcessAlreadyFinished) {
continue
}
errList = append(errList, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[i]), Reason: fmt.Sprintf("failed to revert chaos: %s", err.Error())}.Error())
continue
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainers[i])
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
//terminateProcess will remove the stress process from the target container after chaos completion
func terminateProcess(pid int) error {
if err := syscall.Kill(-pid, syscall.SIGKILL); err != nil {
if strings.Contains(err.Error(), ProcessAlreadyKilled) || strings.Contains(err.Error(), ProcessAlreadyFinished) {
return nil
}
return err
}
log.Info("[Info]: Stress process removed successfully")
return nil
}
//prepareStressor will set the required stressors for the given experiment
// prepareStressor will set the required stressors for the given experiment
func prepareStressor(experimentDetails *experimentTypes.ExperimentDetails) []string {
stressArgs := []string{
@ -281,33 +407,33 @@ func prepareStressor(experimentDetails *experimentTypes.ExperimentDetails) []str
}
default:
log.Fatalf("stressor for %v experiment is not suported", experimentDetails.ExperimentName)
log.Fatalf("stressor for %v experiment is not supported", experimentDetails.ExperimentName)
}
return stressArgs
}
//pidPath will get the pid path of the container
func pidPath(pid int) cgroups.Path {
processPath := "/proc/" + strconv.Itoa(pid) + "/cgroup"
paths, err := parseCgroupFile(processPath)
// pidPath will get the pid path of the container
func pidPath(t *targetDetails, index int) cgroups.Path {
processPath := "/proc/" + strconv.Itoa(t.Pids[index]) + "/cgroup"
paths, err := parseCgroupFile(processPath, t, index)
if err != nil {
return getErrorPath(errors.Wrapf(err, "parse cgroup file %s", processPath))
}
return getExistingPath(paths, pid, "")
return getExistingPath(paths, t.Pids[index], "")
}
//parseCgroupFile will read and verify the cgroup file entry of a container
func parseCgroupFile(path string) (map[string]string, error) {
// parseCgroupFile will read and verify the cgroup file entry of a container
func parseCgroupFile(path string, t *targetDetails, index int) (map[string]string, error) {
file, err := os.Open(path)
if err != nil {
return nil, errors.Errorf("unable to parse cgroup file: %v", err)
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to parse cgroup: %s", err.Error())}
}
defer file.Close()
return parseCgroupFromReader(file)
return parseCgroupFromReader(file, t, index)
}
//parseCgroupFromReader will parse the cgroup file from the reader
func parseCgroupFromReader(r io.Reader) (map[string]string, error) {
// parseCgroupFromReader will parse the cgroup file from the reader
func parseCgroupFromReader(r io.Reader, t *targetDetails, index int) (map[string]string, error) {
var (
cgroups = make(map[string]string)
s = bufio.NewScanner(r)
@ -318,7 +444,7 @@ func parseCgroupFromReader(r io.Reader) (map[string]string, error) {
parts = strings.SplitN(text, ":", 3)
)
if len(parts) < 3 {
return nil, errors.Errorf("invalid cgroup entry: %q", text)
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("invalid cgroup entry: %q", text)}
}
for _, subs := range strings.Split(parts[1], ",") {
if subs != "" {
@ -327,13 +453,13 @@ func parseCgroupFromReader(r io.Reader) (map[string]string, error) {
}
}
if err := s.Err(); err != nil {
return nil, errors.Errorf("buffer scanner failed: %v", err)
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("buffer scanner failed: %s", err.Error())}
}
return cgroups, nil
}
//getExistingPath will be used to get the existing valid cgroup path
// getExistingPath will be used to get the existing valid cgroup path
func getExistingPath(paths map[string]string, pid int, suffix string) cgroups.Path {
for n, p := range paths {
dest, err := getCgroupDestination(pid, n)
@ -363,14 +489,14 @@ func getExistingPath(paths map[string]string, pid int, suffix string) cgroups.Pa
}
}
//getErrorPath will give the invalid cgroup path
// getErrorPath will give the invalid cgroup path
func getErrorPath(err error) cgroups.Path {
return func(_ cgroups.Name) (string, error) {
return "", err
}
}
//getCgroupDestination will validate the subsystem with the mountpath in container mountinfo file.
// getCgroupDestination will validate the subsystem with the mountpath in container mountinfo file.
func getCgroupDestination(pid int, subsystem string) (string, error) {
mountinfoPath := fmt.Sprintf("/proc/%d/mountinfo", pid)
file, err := os.Open(mountinfoPath)
@ -393,28 +519,25 @@ func getCgroupDestination(pid int, subsystem string) (string, error) {
return "", errors.Errorf("no destination found for %v ", subsystem)
}
//findValidCgroup will be used to get a valid cgroup path
func findValidCgroup(path cgroups.Path, target string) (string, error) {
// findValidCgroup will be used to get a valid cgroup path
func findValidCgroup(path cgroups.Path, t *targetDetails, index int) (string, error) {
for _, subsystem := range cgroupSubsystemList {
path, err := path(cgroups.Name(subsystem))
if err != nil {
log.Errorf("fail to retrieve the cgroup path, subsystem: %v, target: %v, err: %v", subsystem, target, err)
log.Errorf("fail to retrieve the cgroup path, subsystem: %v, target: %v, err: %v", subsystem, t.ContainerIds[index], err)
continue
}
if strings.Contains(path, target) {
if strings.Contains(path, t.ContainerIds[index]) {
return path, nil
}
}
return "", errors.Errorf("never found valid cgroup for %s", target)
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: "could not find valid cgroup"}
}
//getENV fetches all the env variables from the runner pod
// getENV fetches all the env variables from the runner pod
func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "")
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.TargetContainer = types.Getenv("APP_CONTAINER", "")
experimentDetails.TargetPods = types.Getenv("APP_POD", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
@ -433,7 +556,7 @@ func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targetPID int, resultName, chaosNS, targetPodName string) {
func abortWatcher(targets []*targetDetails, resultName, chaosNS string) {
<-abort
@ -442,53 +565,133 @@ func abortWatcher(targetPID int, resultName, chaosNS, targetPodName string) {
// retry thrice for the chaos revert
retry := 3
for retry > 0 {
if err = terminateProcess(targetPID); err != nil {
log.Errorf("unable to kill stress process, err :%v", err)
for _, t := range targets {
if err = terminateProcess(t); err != nil {
log.Errorf("[Abort]: unable to revert for %v pod, err :%v", t.Name, err)
continue
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", t.Name); err != nil {
log.Errorf("[Abort]: Unable to annotate the chaosresult for %v pod, err :%v", t.Name, err)
}
}
retry--
time.Sleep(1 * time.Second)
}
if err = result.AnnotateChaosResult(resultName, chaosNS, "reverted", "pod", targetPodName); err != nil {
log.Errorf("unable to annotate the chaosresult, err :%v", err)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}
// getCGroupManager will return the cgroup for the given pid of the process
func getCGroupManager(pid int, containerID string) (interface{}, error) {
func getCGroupManager(t *targetDetails, index int) (interface{}, error, string) {
if cgroups.Mode() == cgroups.Unified {
groupPath, err := cgroupsv2.PidGroupPath(pid)
groupPath := ""
output, err := exec.Command("bash", "-c", fmt.Sprintf("nsenter -t 1 -C -m -- cat /proc/%v/cgroup", t.Pids[index])).CombinedOutput()
if err != nil {
return nil, errors.Errorf("Error in getting groupPath, %v", err)
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to get the cgroup: %s :%v", err.Error(), output)}, ""
}
log.Infof("cgroup output: %s", string(output))
parts := strings.Split(string(output), ":")
if len(parts) < 3 {
return "", fmt.Errorf("invalid cgroup entry: %s", string(output)), ""
}
if strings.HasSuffix(parts[len(parts)-3], "0") && parts[len(parts)-2] == "" {
groupPath = parts[len(parts)-1]
}
log.Infof("group path: %s", groupPath)
cgroup2, err := cgroupsv2.LoadManager("/sys/fs/cgroup", groupPath)
cgroup2, err := cgroupsv2.LoadManager("/sys/fs/cgroup", string(groupPath))
if err != nil {
return nil, errors.Errorf("Error loading cgroup v2 manager, %v", err)
return nil, errors.Errorf("Error loading cgroup v2 manager, %v", err), ""
}
return cgroup2, nil
return cgroup2, nil, groupPath
}
path := pidPath(pid)
cgroup, err := findValidCgroup(path, containerID)
path := pidPath(t, index)
cgroup, err := findValidCgroup(path, t, index)
if err != nil {
return nil, errors.Errorf("fail to get cgroup, err: %v", err)
return nil, stacktrace.Propagate(err, "could not find valid cgroup"), ""
}
cgroup1, err := cgroups.Load(cgroups.V1, cgroups.StaticPath(cgroup))
if err != nil {
return nil, errors.Errorf("fail to load the cgroup, err: %v", err)
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to load the cgroup: %s", err.Error())}, ""
}
return cgroup1, nil
return cgroup1, nil, ""
}
// addProcessToCgroup will add the process to cgroup
// By default it will add to v1 cgroup
func addProcessToCgroup(pid int, control interface{}) error {
func addProcessToCgroup(pid int, control interface{}, groupPath string) error {
if cgroups.Mode() == cgroups.Unified {
var cgroup1 = control.(*cgroupsv2.Manager)
return cgroup1.AddProc(uint64(pid))
args := []string{"-t", "1", "-C", "--", "sudo", "sh", "-c", fmt.Sprintf("echo %d >> /sys/fs/cgroup%s/cgroup.procs", pid, strings.ReplaceAll(groupPath, "\n", ""))}
output, err := exec.Command("nsenter", args...).CombinedOutput()
if err != nil {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: fmt.Sprintf("failed to add process to cgroup %s: %v", string(output), err),
}
}
return nil
}
var cgroup1 = control.(cgroups.Cgroup)
return cgroup1.Add(cgroups.Process{Pid: pid})
}
func injectChaos(t *targetDetails, stressors string, index int, stressType string) (*Command, error) {
stressCommand := fmt.Sprintf("pause nsutil -t %v -p -- %v", strconv.Itoa(t.Pids[index]), stressors)
// for io stress,we need to enter into mount ns of the target container
// enabling it by passing -m flag
if stressType == "pod-io-stress" {
stressCommand = fmt.Sprintf("pause nsutil -t %v -p -m -- %v", strconv.Itoa(t.Pids[index]), stressors)
}
log.Infof("[Info]: starting process: %v", stressCommand)
// launch the stress-ng process on the target container in paused mode
cmd := exec.Command("/bin/bash", "-c", stressCommand)
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
var buf bytes.Buffer
cmd.Stdout = &buf
cmd.Stderr = &buf
err = cmd.Start()
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("failed to start stress process: %s", err.Error())}
}
// add the stress process to the cgroup of target container
if err = addProcessToCgroup(cmd.Process.Pid, t.CGroupManagers[index], t.GroupPath); err != nil {
if killErr := cmd.Process.Kill(); killErr != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to add the stress process to cgroup %s and kill stress process: %s", err.Error(), killErr.Error())}
}
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to add the stress process to cgroup: %s", err.Error())}
}
log.Info("[Info]: Sending signal to resume the stress process")
// wait for the process to start before sending the resume signal
// TODO: need a dynamic way to check the start of the process
time.Sleep(700 * time.Millisecond)
// remove pause and resume or start the stress process
if err := cmd.Process.Signal(syscall.SIGCONT); err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to remove pause and start the stress process: %s", err.Error())}
}
return &Command{
Cmd: cmd,
Buffer: buf,
}, nil
}
type targetDetails struct {
Name string
Namespace string
TargetContainers []string
ContainerIds []string
Pids []int
CGroupManagers []interface{}
Cmds []*Command
Source string
GroupPath string
}
type Command struct {
Cmd *exec.Cmd
Buffer bytes.Buffer
}

View File

@ -2,29 +2,34 @@ package lib
import (
"context"
"fmt"
"os"
"strconv"
"strings"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareAndInjectStressChaos contains the prepration & injection steps for the stress experiments.
func PrepareAndInjectStressChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
targetPodList := apiv1.PodList{}
// PrepareAndInjectStressChaos contains the prepration & injection steps for the stress experiments.
func PrepareAndInjectStressChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodStressFault")
defer span.End()
var err error
var podsAffectedPerc int
//Setup the tunables if provided in range
//Set up the tunables if provided in range
SetChaosTunables(experimentsDetails)
switch experimentsDetails.StressType {
@ -56,36 +61,14 @@ func PrepareAndInjectStressChaos(experimentsDetails *experimentTypes.ExperimentD
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("Please provide one of the appLabel or TARGET_PODS")
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
podsAffectedPerc, _ = strconv.Atoi(experimentsDetails.PodsAffectedPerc)
if experimentsDetails.NodeLabel == "" {
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
} else {
if experimentsDetails.TargetPods == "" {
targetPodList, err = common.GetPodListFromSpecifiedNodes(experimentsDetails.TargetPods, podsAffectedPerc, experimentsDetails.NodeLabel, clients, chaosDetails)
if err != nil {
return err
}
} else {
log.Infof("TARGET_PODS env is provided, overriding the NODE_LABEL input")
targetPodList, err = common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
}
targetPodList, err := common.GetTargetPods(experimentsDetails.NodeLabel, experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("[Info]: Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
@ -96,41 +79,41 @@ func PrepareAndInjectStressChaos(experimentsDetails *experimentTypes.ExperimentD
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err)
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not set helper data")
}
}
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "")
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
return nil
}
// injectChaosInSerialMode inject the stress chaos in all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodStressFaultInSerialMode")
defer span.End()
labelSuffix := common.GetRunID()
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -140,10 +123,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
@ -151,115 +131,69 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
runID := stringutils.GetRunID()
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, fmt.Sprintf("%s:%s:%s", pod.Name, pod.Namespace, experimentsDetails.TargetContainer), pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for stress chaos
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("unable to delete the helper pods, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
}
// injectChaosInParallelMode inject the stress chaos in all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodStressFaultInParallelMode")
defer span.End()
labelSuffix := common.GetRunID()
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform stress chaos
for _, pod := range targetPodList.Items {
runID := stringutils.GetRunID()
targets := common.FilterPodsForNodes(targetPodList, experimentsDetails.TargetContainer)
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
for node, tar := range targets {
var targetsPerNode []string
for _, k := range tar.Target {
targetsPerNode = append(targetsPerNode, fmt.Sprintf("%s:%s:%s", k.Name, k.Namespace, k.TargetContainer))
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": pod.Name,
"NodeName": pod.Spec.NodeName,
"ContainerName": experimentsDetails.TargetContainer,
})
runID := common.GetRunID()
err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix)
if err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, strings.Join(targetsPerNode, ";"), node, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting all the helper pod for stress chaos
log.Info("[Cleanup]: Deleting all the helper pod")
err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return errors.Errorf("unable to delete the helper pods, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, podName, nodeName, runID, labelSuffix string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreatePodStressFaultHelperPod")
defer span.End()
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
@ -301,7 +235,7 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
"./helpers -name stress-chaos",
},
Resources: chaosDetails.Resources,
Env: getPodEnv(experimentsDetails, podName),
Env: getPodEnv(ctx, experimentsDetails, targets),
VolumeMounts: []apiv1.VolumeMount{
{
Name: "socket-path",
@ -326,18 +260,23 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
if len(chaosDetails.SideCar) != 0 {
helperPod.Spec.Containers = append(helperPod.Spec.Containers, common.BuildSidecar(chaosDetails)...)
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getPodEnv derive all the env required for the helper pod
func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName string) []apiv1.EnvVar {
func getPodEnv(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targets string) []apiv1.EnvVar {
var envDetails common.ENVDetails
envDetails.SetEnv("APP_NAMESPACE", experimentsDetails.AppNS).
SetEnv("APP_POD", podName).
SetEnv("APP_CONTAINER", experimentsDetails.TargetContainer).
envDetails.SetEnv("TARGETS", targets).
SetEnv("TOTAL_CHAOS_DURATION", strconv.Itoa(experimentsDetails.ChaosDuration)).
SetEnv("CHAOS_NAMESPACE", experimentsDetails.ChaosNamespace).
SetEnv("CHAOSENGINE", experimentsDetails.EngineName).
@ -354,6 +293,8 @@ func getPodEnv(experimentsDetails *experimentTypes.ExperimentDetails, podName st
SetEnv("VOLUME_MOUNT_PATH", experimentsDetails.VolumeMountPath).
SetEnv("STRESS_TYPE", experimentsDetails.StressType).
SetEnv("INSTANCE_ID", experimentsDetails.InstanceID).
SetEnv("OTEL_EXPORTER_OTLP_ENDPOINT", os.Getenv(telemetry.OTELExporterOTLPEndpoint)).
SetEnv("TRACE_PARENT", telemetry.GetMarshalledSpanFromContext(ctx)).
SetEnvFromDownwardAPI("v1", "metadata.name")
return envDetails.ENV
@ -363,8 +304,8 @@ func ptrint64(p int64) *int64 {
return &p
}
//SetChaosTunables will setup a random value within a given range of values
//If the value is not provided in range it'll setup the initial provided value.
// SetChaosTunables will set up a random value within a given range of values
// If the value is not provided in range it'll set up the initial provided value.
func SetChaosTunables(experimentsDetails *experimentTypes.ExperimentDetails) {
experimentsDetails.CPUcores = common.ValidateRange(experimentsDetails.CPUcores)
experimentsDetails.CPULoad = common.ValidateRange(experimentsDetails.CPULoad)

View File

@ -1,28 +1,34 @@
package lib
import (
"context"
"fmt"
"os"
"os/signal"
"strings"
"syscall"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/cloud/vmware"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/vmware/vm-poweroff/types"
"github.com/pkg/errors"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
)
var inject, abort chan os.Signal
// InjectVMPowerOffChaos injects the chaos in serial or parallel mode
func InjectVMPowerOffChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, cookie string) error {
func InjectVMPowerOffChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, cookie string) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareVMPowerOffFault")
defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
@ -47,15 +53,15 @@ func InjectVMPowerOffChaos(experimentsDetails *experimentTypes.ExperimentDetails
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err := injectChaosInSerialMode(experimentsDetails, vmIdList, cookie, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err := injectChaosInSerialMode(ctx, experimentsDetails, vmIdList, cookie, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err := injectChaosInParallelMode(experimentsDetails, vmIdList, cookie, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err := injectChaosInParallelMode(ctx, experimentsDetails, vmIdList, cookie, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -68,7 +74,10 @@ func InjectVMPowerOffChaos(experimentsDetails *experimentTypes.ExperimentDetails
}
// injectChaosInSerialMode stops VMs in serial mode i.e. one after the other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, cookie string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, cookie string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "injectVMPowerOffFaultInSerialMode")
defer span.End()
select {
case <-inject:
// stopping the chaos execution, if abort signal received
@ -93,7 +102,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Stopping the VM
log.Infof("[Chaos]: Stopping %s VM", vmId)
if err := vmware.StopVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("failed to stop %s vm: %s", vmId, err.Error())
return stacktrace.Propagate(err, fmt.Sprintf("failed to stop %s vm", vmId))
}
common.SetTargets(vmId, "injected", "VM", chaosDetails)
@ -101,14 +110,14 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Wait for the VM to completely stop
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_OFF state", vmId)
if err := vmware.WaitForVMStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("vm %s failed to successfully shutdown, err: %s", vmId, err.Error())
return stacktrace.Propagate(err, "VM shutdown failed")
}
//Run the probes during the chaos
//The OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -119,13 +128,13 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
//Starting the VM
log.Infof("[Chaos]: Starting back %s VM", vmId)
if err := vmware.StartVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("failed to start back %s vm: %s", vmId, err.Error())
return stacktrace.Propagate(err, "failed to start back vm")
}
//Wait for the VM to completely start
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_ON state", vmId)
if err := vmware.WaitForVMStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("vm %s failed to successfully start, err: %s", vmId, err.Error())
return stacktrace.Propagate(err, "vm failed to start")
}
common.SetTargets(vmId, "reverted", "VM", chaosDetails)
@ -139,7 +148,9 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode stops VMs in parallel mode i.e. all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, cookie string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, vmIdList []string, cookie string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "injectVMPowerOffFaultInParallelMode")
defer span.End()
select {
case <-inject:
@ -165,7 +176,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Stopping the VM
log.Infof("[Chaos]: Stopping %s VM", vmId)
if err := vmware.StopVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("failed to stop %s vm: %s", vmId, err.Error())
return stacktrace.Propagate(err, fmt.Sprintf("failed to stop %s vm", vmId))
}
common.SetTargets(vmId, "injected", "VM", chaosDetails)
@ -176,14 +187,14 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for the VM to completely stop
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_OFF state", vmId)
if err := vmware.WaitForVMStop(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("vm %s failed to successfully shutdown, err: %s", vmId, err.Error())
return stacktrace.Propagate(err, "vm failed to shutdown")
}
}
//Running the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
@ -196,7 +207,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Starting the VM
log.Infof("[Chaos]: Starting back %s VM", vmId)
if err := vmware.StartVM(experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("failed to start back %s vm: %s", vmId, err.Error())
return stacktrace.Propagate(err, fmt.Sprintf("failed to start back %s vm", vmId))
}
}
@ -205,7 +216,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
//Wait for the VM to completely start
log.Infof("[Wait]: Wait for VM '%s' to get in POWERED_ON state", vmId)
if err := vmware.WaitForVMStart(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.VcenterServer, vmId, cookie); err != nil {
return errors.Errorf("vm %s failed to successfully start, err: %s", vmId, err.Error())
return stacktrace.Propagate(err, "vm failed to successfully start")
}
}

View File

@ -1,272 +0,0 @@
package lib
import (
"context"
"strconv"
"time"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-delete/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
appsv1 "k8s.io/api/apps/v1"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PreparePodDelete contains the prepration steps before chaos injection
func PreparePodDelete(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.ChaosServiceAccount == "" {
// Getting the serviceAccountName for the powerfulseal pod
err := GetServiceAccount(experimentsDetails, clients)
if err != nil {
return errors.Errorf("Unable to get the serviceAccountName, err: %v", err)
}
}
// generating a unique string which can be appended with the powerfulseal deployment name & labels for the uniquely identification
runID := common.GetRunID()
// generating the chaos inject event in the chaosengine
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on application pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// Creating configmap for powerfulseal deployment
err := CreateConfigMap(experimentsDetails, clients, runID)
if err != nil {
return err
}
// Creating powerfulseal deployment
err = CreatePowerfulsealDeployment(experimentsDetails, clients, runID)
if err != nil {
return errors.Errorf("Unable to create the helper pod, err: %v", err)
}
//checking the status of the powerfulseal pod, wait till the powerfulseal pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
err = status.CheckApplicationStatus(experimentsDetails.ChaosNamespace, "name=powerfulseal-"+runID, experimentsDetails.Timeout, experimentsDetails.Delay, clients)
if err != nil {
return errors.Errorf("powerfulseal pod is not in running state, err: %v", err)
}
// Wait for Chaos Duration
log.Infof("[Wait]: Waiting for the %vs chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
//Deleting the powerfulseal deployment
log.Info("[Cleanup]: Deleting the powerfulseal deployment")
err = DeletePowerfulsealDeployment(experimentsDetails, clients, runID)
if err != nil {
return errors.Errorf("Unable to delete the powerfulseal deployment, err: %v", err)
}
//Deleting the powerfulseal configmap
log.Info("[Cleanup]: Deleting the powerfulseal configmap")
err = DeletePowerfulsealConfigmap(experimentsDetails, clients, runID)
if err != nil {
return errors.Errorf("Unable to delete the powerfulseal configmap, err: %v", err)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// GetServiceAccount find the serviceAccountName for the powerfulseal deployment
func GetServiceAccount(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Get(context.Background(), experimentsDetails.ChaosPodName, v1.GetOptions{})
if err != nil {
return err
}
experimentsDetails.ChaosServiceAccount = pod.Spec.ServiceAccountName
return nil
}
// CreateConfigMap creates a configmap for the powerfulseal deployment
func CreateConfigMap(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
data := map[string]string{}
// It will store all the details inside a string in well formated way
policy := GetConfigMapData(experimentsDetails)
data["policy"] = policy
configMap := &apiv1.ConfigMap{
ObjectMeta: v1.ObjectMeta{
Name: "policy-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"name": "policy-" + runID,
},
},
Data: data,
}
_, err := clients.KubeClient.CoreV1().ConfigMaps(experimentsDetails.ChaosNamespace).Create(context.Background(), configMap, v1.CreateOptions{})
return err
}
// GetConfigMapData generates the configmap data for the powerfulseal deployments in desired format format
func GetConfigMapData(experimentsDetails *experimentTypes.ExperimentDetails) string {
waitTime, _ := strconv.Atoi(experimentsDetails.ChaosInterval)
policy := "config:" + "\n" +
" minSecondsBetweenRuns: 1" + "\n" +
" maxSecondsBetweenRuns: " + strconv.Itoa(waitTime) + "\n" +
"podScenarios:" + "\n" +
" - name: \"delete random pods in application namespace\"" + "\n" +
" match:" + "\n" +
" - labels:" + "\n" +
" namespace: " + experimentsDetails.AppNS + "\n" +
" selector: " + experimentsDetails.AppLabel + "\n" +
" filters:" + "\n" +
" - randomSample:" + "\n" +
" size: 1" + "\n" +
" actions:" + "\n" +
" - kill:" + "\n" +
" probability: 0.77" + "\n" +
" force: " + strconv.FormatBool(experimentsDetails.Force)
return policy
}
// CreatePowerfulsealDeployment derive the attributes for powerfulseal deployment and create it
func CreatePowerfulsealDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
deployment := &appsv1.Deployment{
ObjectMeta: v1.ObjectMeta{
Name: "powerfulseal-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: map[string]string{
"app": "powerfulseal",
"name": "powerfulseal-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
"app.kubernetes.io/part-of": "litmus",
},
},
Spec: appsv1.DeploymentSpec{
Selector: &v1.LabelSelector{
MatchLabels: map[string]string{
"name": "powerfulseal-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
},
Replicas: func(i int32) *int32 { return &i }(1),
Template: apiv1.PodTemplateSpec{
ObjectMeta: v1.ObjectMeta{
Labels: map[string]string{
"name": "powerfulseal-" + runID,
"chaosUID": string(experimentsDetails.ChaosUID),
},
},
Spec: apiv1.PodSpec{
Volumes: []apiv1.Volume{
{
Name: "policyfile",
VolumeSource: apiv1.VolumeSource{
ConfigMap: &apiv1.ConfigMapVolumeSource{
LocalObjectReference: apiv1.LocalObjectReference{
Name: "policy-" + runID,
},
},
},
},
},
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
TerminationGracePeriodSeconds: func(i int64) *int64 { return &i }(0),
Containers: []apiv1.Container{
{
Name: "powerfulseal",
Image: "ksatchit/miko-powerfulseal:non-ssh",
Args: []string{
"autonomous",
"--inventory-kubernetes",
"--no-cloud",
"--policy-file=/root/policy_kill_random_default.yml",
"--use-pod-delete-instead-of-ssh-kill",
},
VolumeMounts: []apiv1.VolumeMount{
{
Name: "policyfile",
MountPath: "/root/policy_kill_random_default.yml",
SubPath: "policy",
},
},
},
},
},
},
},
}
_, err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaosNamespace).Create(context.Background(), deployment, v1.CreateOptions{})
return err
}
//DeletePowerfulsealDeployment delete the powerfulseal deployment
func DeletePowerfulsealDeployment(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaosNamespace).Delete(context.Background(), "powerfulseal-"+runID, v1.DeleteOptions{})
if err != nil {
return err
}
err = retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
podSpec, err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaosNamespace).List(context.Background(), v1.ListOptions{LabelSelector: "name=powerfulseal-" + runID})
if err != nil || len(podSpec.Items) != 0 {
return errors.Errorf("Deployment is not deleted yet, err: %v", err)
}
return nil
})
return err
}
//DeletePowerfulsealConfigmap delete the powerfulseal configmap
func DeletePowerfulsealConfigmap(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, runID string) error {
err := clients.KubeClient.CoreV1().ConfigMaps(experimentsDetails.ChaosNamespace).Delete(context.Background(), "policy-"+runID, v1.DeleteOptions{})
if err != nil {
return err
}
err = retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
podSpec, err := clients.KubeClient.CoreV1().ConfigMaps(experimentsDetails.ChaosNamespace).List(context.Background(), v1.ListOptions{LabelSelector: "name=policy-" + runID})
if err != nil || len(podSpec.Items) != 0 {
return errors.Errorf("configmap is not deleted yet, err: %v", err)
}
return nil
})
return err
}

View File

@ -1,364 +0,0 @@
package lib
import (
"context"
"strconv"
"strings"
"time"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/container-kill/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareContainerKill contains the prepration steps before chaos injection
func PrepareContainerKill(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
//Setup the tunables if provided in range
litmusLIB.SetChaosTunables(experimentsDetails)
log.InfoWithValues("[Info]: The tunables are:", logrus.Fields{
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
}
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
experimentsDetails.IsTargetContainerProvided = (experimentsDetails.TargetContainer != "")
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode kill the container of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
//GetRestartCount return the restart count of target container
restartCountBefore := getRestartCount(pod, experimentsDetails.TargetContainer)
log.Infof("restartCount of target container before chaos injection: %v", restartCountBefore)
runID := common.GetRunID()
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"Target Container": experimentsDetails.TargetContainer,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
log.Infof("[Wait]: Waiting for the %vs chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
// It will verify that the restart count of container should increase after chaos injection
if err := verifyRestartCount(experimentsDetails, pod, clients, restartCountBefore); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("target container is not restarted, err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
var err error
//GetRestartCount return the restart count of target container
restartCountBefore := getRestartCountAll(targetPodList, experimentsDetails.TargetContainer)
log.Infof("restartCount of target containers before chaos injection: %v", restartCountBefore)
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, pod.Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"Target Container": experimentsDetails.TargetContainer,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
log.Infof("[Wait]: Waiting for the %vs chaos duration", experimentsDetails.ChaosDuration)
common.WaitForDuration(experimentsDetails.ChaosDuration)
// It will verify that the restart count of container should increase after chaos injection
if err := verifyRestartCountAll(experimentsDetails, targetPodList, clients, restartCountBefore); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("target container is not restarted , err: %v", err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
return nil
}
//getRestartCount return the restart count of target container
func getRestartCount(targetPod apiv1.Pod, containerName string) int {
restartCount := 0
for _, container := range targetPod.Status.ContainerStatuses {
if container.Name == containerName {
restartCount = int(container.RestartCount)
break
}
}
return restartCount
}
//getRestartCountAll return the restart count of all target container
func getRestartCountAll(targetPodList apiv1.PodList, containerName string) []int {
restartCount := []int{}
for _, pod := range targetPodList.Items {
restartCount = append(restartCount, getRestartCount(pod, containerName))
}
return restartCount
}
//verifyRestartCount verify the restart count of target container that it is restarted or not after chaos injection
// the restart count of container should increase after chaos injection
func verifyRestartCount(experimentsDetails *experimentTypes.ExperimentDetails, pod apiv1.Pod, clients clients.ClientSets, restartCountBefore int) error {
restartCountAfter := 0
err := retry.
Times(90).
Wait(1 * time.Second).
Try(func(attempt uint) error {
pod, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.AppNS).Get(context.Background(), pod.Name, v1.GetOptions{})
if err != nil {
return err
}
for _, container := range pod.Status.ContainerStatuses {
if container.Name == experimentsDetails.TargetContainer {
restartCountAfter = int(container.RestartCount)
break
}
}
return nil
})
if err != nil {
return err
}
// it will fail if restart count won't increase
if restartCountAfter <= restartCountBefore {
return errors.Errorf("target container is not restarted")
}
log.Infof("restartCount of target container after chaos injection: %v", restartCountAfter)
return nil
}
//verifyRestartCountAll verify the restart count of all the target container that it is restarted or not after chaos injection
// the restart count of container should increase after chaos injection
func verifyRestartCountAll(experimentsDetails *experimentTypes.ExperimentDetails, podList apiv1.PodList, clients clients.ClientSets, restartCountBefore []int) error {
for index, pod := range podList.Items {
if err := verifyRestartCount(experimentsDetails, pod, clients, restartCountBefore[index]); err != nil {
return err
}
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error {
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"sudo",
"-E",
},
Args: []string{
"pumba",
"--random",
"--interval",
strconv.Itoa(experimentsDetails.ChaosInterval) + "s",
"kill",
"--signal",
experimentsDetails.Signal,
"re2:k8s_" + experimentsDetails.TargetContainer + "_" + appName,
},
Env: []apiv1.EnvVar{
{
Name: "DOCKER_HOST",
Value: "unix://" + experimentsDetails.SocketPath,
},
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
}

View File

@ -1,280 +0,0 @@
package lib
import (
"context"
"strconv"
"strings"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/stress-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PreparePodCPUHog contains prepration steps before chaos injection
func PreparePodCPUHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//setup the tunables if provided in range
litmusLIB.SetChaosTunables(experimentsDetails)
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stress the cpu of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform cpu chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"CPUcores": experimentsDetails.CPUcores,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform cpu chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"CPUcores": experimentsDetails.CPUcores,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error {
helperPod := &apiv1.Pod{
TypeMeta: v1.TypeMeta{
Kind: "Pod",
APIVersion: "v1",
},
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: "pumba-stress",
Image: experimentsDetails.LIBImage,
Command: []string{
"sudo",
"-E",
},
Args: getContainerArguments(experimentsDetails, appName),
Env: []apiv1.EnvVar{
{
Name: "DOCKER_HOST",
Value: "unix://" + experimentsDetails.SocketPath,
},
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
SecurityContext: &apiv1.SecurityContext{
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
}
// getContainerArguments derives the args for the pumba stress helper pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails, appName string) []string {
stressArgs := []string{
"pumba",
"--log-level",
"debug",
"--label",
"io.kubernetes.pod.name=" + appName,
"stress",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--stress-image",
experimentsDetails.StressImage,
"--stressors",
"--cpu " + experimentsDetails.CPUcores + " --timeout " + strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
return stressArgs
}

View File

@ -1,281 +0,0 @@
package lib
import (
"context"
"strconv"
"strings"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/stress-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PreparePodMemoryHog contains prepration steps before chaos injection
func PreparePodMemoryHog(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//setup the tunables if provided in range
litmusLIB.SetChaosTunables(experimentsDetails)
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stress the cpu of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform memory chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"MemoryBytes": experimentsDetails.MemoryConsumption,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform memory chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"MemoryBytes": experimentsDetails.MemoryConsumption,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error {
helperPod := &apiv1.Pod{
TypeMeta: v1.TypeMeta{
Kind: "Pod",
APIVersion: "v1",
},
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: "pumba-stress",
Image: experimentsDetails.LIBImage,
Command: []string{
"sudo",
"-E",
},
Args: getContainerArguments(experimentsDetails, appName),
Env: []apiv1.EnvVar{
{
Name: "DOCKER_HOST",
Value: "unix://" + experimentsDetails.SocketPath,
},
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
SecurityContext: &apiv1.SecurityContext{
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
}
// getContainerArguments derives the args for the pumba stress helper pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails, appName string) []string {
stressArgs := []string{
"pumba",
"--log-level",
"debug",
"--label",
"io.kubernetes.pod.name=" + appName,
"stress",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--stress-image",
experimentsDetails.StressImage,
"--stressors",
"--cpu 1 --vm 1 --vm-bytes " + experimentsDetails.MemoryConsumption + "M --timeout " + strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
return stressArgs
}

View File

@ -1,43 +0,0 @@
package corruption
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
//PodNetworkCorruptionChaos contains the steps to prepare and inject chaos
func PodNetworkCorruptionChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args, err := getContainerArguments(experimentsDetails)
if err != nil {
return err
}
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// getContainerArguments derives the args for the pumba pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) ([]string, error) {
baseArgs := []string{
"pumba",
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args, err := network_chaos.AddTargetIpsArgs(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, args)
if err != nil {
return args, err
}
args = append(args, "corrupt", "--percent", experimentsDetails.NetworkPacketCorruptionPercentage)
return args, nil
}

View File

@ -1,43 +0,0 @@
package duplication
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
//PodNetworkDuplicationChaos contains the steps to prepare and inject chaos
func PodNetworkDuplicationChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args, err := getContainerArguments(experimentsDetails)
if err != nil {
return err
}
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// getContainerArguments derives the args for the pumba pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) ([]string, error) {
baseArgs := []string{
"pumba",
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args, err := network_chaos.AddTargetIpsArgs(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, args)
if err != nil {
return args, err
}
args = append(args, "duplicate", "--percent", experimentsDetails.NetworkPacketDuplicationPercentage)
return args, nil
}

View File

@ -1,43 +0,0 @@
package latency
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
//PodNetworkLatencyChaos contains the steps to prepare and inject chaos
func PodNetworkLatencyChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args, err := getContainerArguments(experimentsDetails)
if err != nil {
return err
}
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// getContainerArguments derives the args for the pumba pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) ([]string, error) {
baseArgs := []string{
"pumba",
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args, err := network_chaos.AddTargetIpsArgs(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, args)
if err != nil {
return args, err
}
args = append(args, "delay", "--time", strconv.Itoa(experimentsDetails.NetworkLatency))
return args, nil
}

View File

@ -1,43 +0,0 @@
package loss
import (
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/pumba/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
//PodNetworkLossChaos contains the steps to prepare and inject chaos
func PodNetworkLossChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
args, err := getContainerArguments(experimentsDetails)
if err != nil {
return err
}
return network_chaos.PrepareAndInjectChaos(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}
// getContainerArguments derives the args for the pumba pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails) ([]string, error) {
baseArgs := []string{
"pumba",
"netem",
"--tc-image",
experimentsDetails.TCImage,
"--interface",
experimentsDetails.NetworkInterface,
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
}
args := baseArgs
args, err := network_chaos.AddTargetIpsArgs(experimentsDetails.DestinationIPs, experimentsDetails.DestinationHosts, args)
if err != nil {
return args, err
}
args = append(args, "loss", "--percent", experimentsDetails.NetworkPacketLossPercentage)
return args, nil
}

View File

@ -1,305 +0,0 @@
package lib
import (
"context"
"strconv"
"strings"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
//PrepareAndInjectChaos contains the prepration and chaos injection steps
func PrepareAndInjectChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, args []string) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
//setup the tunables if provided in range
litmusLIB.SetChaosTunables(experimentsDetails)
switch experimentsDetails.NetworkChaosType {
case "network-loss":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketLossPercentage": experimentsDetails.NetworkPacketLossPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "network-latency":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkLatency": strconv.Itoa(experimentsDetails.NetworkLatency),
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "network-corruption":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketCorruptionPercentage": experimentsDetails.NetworkPacketCorruptionPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
case "network-duplication":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketDuplicationPercentage": experimentsDetails.NetworkPacketDuplicationPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
})
default:
return errors.Errorf("invalid experiment, please check the environment.go")
}
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
}
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return err
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, args, resultDetails, eventsDetails); err != nil {
return err
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stress the cpu of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args []string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
})
// args contains details of the specific chaos injection
// constructing `argsWithRegex` based on updated regex with a diff pod name
// without extending/concatenating the args var itself
argsWithRegex := append(args, "re2:k8s_POD_"+pod.Name+"_"+experimentsDetails.AppNS)
log.Infof("Arguments for running %v are %v", experimentsDetails.ExperimentName, argsWithRegex)
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Spec.NodeName, runID, argsWithRegex, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, chaosDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, args []string, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
})
// args contains details of the specific chaos injection
// constructing `argsWithRegex` based on updated regex with a diff pod name
// without extending/concatenating the args var itself
argsWithRegex := append(args, "re2:k8s_POD_"+pod.Name+"_"+experimentsDetails.AppNS)
log.Infof("Arguments for running %v are %v", experimentsDetails.ExperimentName, argsWithRegex)
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Spec.NodeName, runID, argsWithRegex, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, chaosDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appNodeName, runID string, args []string, labelSuffix string) error {
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: experimentsDetails.ExperimentName,
Image: experimentsDetails.LIBImage,
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
Command: []string{
"sudo",
"-E",
},
Args: args,
Env: []apiv1.EnvVar{
{
Name: "DOCKER_HOST",
Value: "unix://" + experimentsDetails.SocketPath,
},
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
}
// AddTargetIpsArgs inserts a comma-separated list of targetIPs (if provided by the user) into the pumba command/args
func AddTargetIpsArgs(targetIPs, targetHosts string, args []string) ([]string, error) {
targetIPs, err := network_chaos.GetTargetIps(targetIPs, targetHosts, clients.ClientSets{}, false)
if err != nil {
return nil, err
}
if targetIPs == "" {
return args, nil
}
ips := strings.Split(targetIPs, ",")
for i := range ips {
args = append(args, "--target", strings.TrimSpace(ips[i]))
}
return args, nil
}

View File

@ -1,306 +0,0 @@
package lib
import (
"context"
"strconv"
"strings"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/stress-chaos/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PreparePodIOStress contains prepration steps before chaos injection
func PreparePodIOStress(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//setup the tunables if provided in range
litmusLIB.SetChaosTunables(experimentsDetails)
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail.Label == "" {
return errors.Errorf("please provide one of the appLabel or TARGET_PODS")
}
podsAffectedPerc, _ := strconv.Atoi(experimentsDetails.PodsAffectedPerc)
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, podsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
podNames := []string{}
for _, pod := range targetPodList.Items {
podNames = append(podNames, pod.Name)
}
log.Infof("Target pods list for chaos, %v", podNames)
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
}
}
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetPodList, clients, chaosDetails, resultDetails, eventsDetails); err != nil {
return err
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode stress the cpu of all target application serially (one by one)
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"FilesystemUtilizationBytes": experimentsDetails.FilesystemUtilizationBytes,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
}
return nil
}
// injectChaosInParallelMode kill the container of all target application in parallel mode (all at once)
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
// creating the helper pod to perform network chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
"NodeName": pod.Spec.NodeName,
"FilesystemUtilizationPercentage": experimentsDetails.FilesystemUtilizationPercentage,
"FilesystemUtilizationBytes": experimentsDetails.FilesystemUtilizationBytes,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
}
}
appLabel := "app=" + experimentsDetails.ExperimentName + "-helper-" + labelSuffix
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
for _, pod := range targetPodList.Items {
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, "pumba-stress")
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
}
return nil
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error {
helperPod := &apiv1.Pod{
TypeMeta: v1.TypeMeta{
Kind: "Pod",
APIVersion: "v1",
},
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
RestartPolicy: apiv1.RestartPolicyNever,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
NodeName: appNodeName,
Volumes: []apiv1.Volume{
{
Name: "dockersocket",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: experimentsDetails.SocketPath,
},
},
},
},
Containers: []apiv1.Container{
{
Name: "pumba-stress",
Image: experimentsDetails.LIBImage,
Command: []string{
"sudo",
"-E",
},
Args: getContainerArguments(experimentsDetails, appName),
Env: []apiv1.EnvVar{
{
Name: "DOCKER_HOST",
Value: "unix://" + experimentsDetails.SocketPath,
},
},
Resources: chaosDetails.Resources,
VolumeMounts: []apiv1.VolumeMount{
{
Name: "dockersocket",
MountPath: experimentsDetails.SocketPath,
},
},
ImagePullPolicy: apiv1.PullPolicy(experimentsDetails.LIBImagePullPolicy),
SecurityContext: &apiv1.SecurityContext{
Capabilities: &apiv1.Capabilities{
Add: []apiv1.Capability{
"SYS_ADMIN",
},
},
},
},
},
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
}
// getContainerArguments derives the args for the pumba stress helper pod
func getContainerArguments(experimentsDetails *experimentTypes.ExperimentDetails, appName string) []string {
var hddbytes string
if experimentsDetails.FilesystemUtilizationBytes == "0" {
if experimentsDetails.FilesystemUtilizationPercentage == "0" {
hddbytes = "10%"
log.Info("Neither of FilesystemUtilizationPercentage or FilesystemUtilizationBytes provided, proceeding with a default FilesystemUtilizationPercentage value of 10%")
} else {
hddbytes = experimentsDetails.FilesystemUtilizationPercentage + "%"
}
} else {
if experimentsDetails.FilesystemUtilizationPercentage == "0" {
hddbytes = experimentsDetails.FilesystemUtilizationBytes + "G"
} else {
hddbytes = experimentsDetails.FilesystemUtilizationPercentage + "%"
log.Warn("Both FsUtilPercentage & FsUtilBytes provided as inputs, using the FsUtilPercentage value to proceed with stress exp")
}
}
stressArgs := []string{
"pumba",
"--log-level",
"debug",
"--label",
"io.kubernetes.pod.name=" + appName,
"stress",
"--duration",
strconv.Itoa(experimentsDetails.ChaosDuration) + "s",
"--stress-image",
experimentsDetails.StressImage,
"--stressors",
}
args := stressArgs
if experimentsDetails.VolumeMountPath == "" {
args = append(args, "--cpu 1 --io "+experimentsDetails.NumberOfWorkers+" --hdd "+experimentsDetails.NumberOfWorkers+" --hdd-bytes "+hddbytes+" --timeout "+strconv.Itoa(experimentsDetails.ChaosDuration)+"s")
} else {
args = append(args, "--cpu 1 --io "+experimentsDetails.NumberOfWorkers+" --hdd "+experimentsDetails.NumberOfWorkers+" --hdd-bytes "+hddbytes+" --temp-path "+experimentsDetails.VolumeMountPath+" --timeout "+strconv.Itoa(experimentsDetails.ChaosDuration)+"s")
}
return args
}

View File

@ -1,7 +1,11 @@
package lib
import (
"context"
"fmt"
"os"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/palantir/stacktrace"
"os/signal"
"syscall"
"time"
@ -13,7 +17,6 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
litmusexec "github.com/litmuschaos/litmus-go/pkg/utils/exec"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
)
@ -25,18 +28,24 @@ func injectChaos(experimentsDetails *experimentTypes.ExperimentDetails, podName
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, experimentsDetails.AppNS)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return errors.Errorf("unable to run command inside target container, err: %v", err)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to inject chaos: %s", err.Error())}
}
return nil
}
func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
@ -45,23 +54,29 @@ func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails,
}
log.Infof("Target pods list for chaos, %v", podNames)
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
return runChaos(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails)
}
func runChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
return runChaos(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails)
}
func runChaos(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
var endTime <-chan time.Time
timeDelay := time.Duration(experimentsDetails.ChaosDuration) * time.Second
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
for _, pod := range targetPodList.Items {
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on " + pod.Name + " pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
@ -99,14 +114,17 @@ func runChaos(experimentsDetails *experimentTypes.ExperimentDetails, targetPodLi
}
}
if err := killChaos(experimentsDetails, pod.Name, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not revert chaos")
}
}
return nil
}
//PrepareChaos contains the preparation steps before chaos injection
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectChaos")
// defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -114,8 +132,8 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the CPU stress experiment
if err := experimentExecution(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails);err != nil {
return err
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails);err != nil {
return stacktrace.Propagate(err, "could not execute experiment")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
@ -134,7 +152,7 @@ func killChaos(experimentsDetails *experimentTypes.ExperimentDetails, podName st
litmusexec.SetExecCommandAttributes(&execCommandDetails, podName, experimentsDetails.TargetContainer, experimentsDetails.AppNS)
_, err := litmusexec.Exec(&execCommandDetails, clients, command)
if err != nil {
return errors.Errorf("unable to kill the process in %v pod, err: %v", podName, err)
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{podName: %s, namespace: %s}", podName, experimentsDetails.AppNS), Reason: fmt.Sprintf("failed to revert chaos: %s", err.Error())}
}
return nil
}

View File

@ -1,6 +1,9 @@
package lib
import (
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/palantir/stacktrace"
"context"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
@ -10,20 +13,26 @@ import (
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func experimentExecution(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
if experimentsDetails.TargetPods == "" && chaosDetails.AppDetail == nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "provide one of the appLabel or TARGET_PODS"}
}
// Get the target pod details for the chaos execution
// if the target pod is not defined it will derive the random target pod list using pod affected percentage
targetPodList, err := common.GetPodList(experimentsDetails.TargetPods, experimentsDetails.PodsAffectedPerc, clients, chaosDetails)
if err != nil {
return err
}
if err != nil {
return stacktrace.Propagate(err, "could not get target pods")
}
podNames := []string{}
for _, pod := range targetPodList.Items {
@ -31,51 +40,48 @@ func experimentExecution(experimentsDetails *experimentTypes.ExperimentDetails,
}
log.Infof("Target pods list for chaos, %v", podNames)
//Get the target container name of the application pod
if experimentsDetails.TargetContainer == "" {
experimentsDetails.TargetContainer, err = common.GetTargetContainer(experimentsDetails.AppNS, targetPodList.Items[0].Name, clients)
if err != nil {
return errors.Errorf("unable to get the target container name, err: %v", err)
}
}
// Getting the serviceAccountName, need permission inside helper pod to create the events
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return errors.Errorf("unable to get the serviceAccountName, err: %v", err)
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
if experimentsDetails.EngineName != "" {
if err := common.SetHelperData(chaosDetails, experimentsDetails.SetHelperData, clients); err != nil {
return err
return stacktrace.Propagate(err, "could not set helper data")
}
}
return runChaos(experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails)
return runChaos(ctx, experimentsDetails, targetPodList, clients, resultDetails, eventsDetails, chaosDetails)
}
func runChaos(experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func runChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList corev1.PodList, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on target pod"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
labelSuffix := common.GetRunID()
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
experimentsDetails.IsTargetContainerProvided = experimentsDetails.TargetContainer != ""
// creating the helper pod to perform container kill chaos
for _, pod := range targetPodList.Items {
runID := common.GetRunID()
//Get the target container name of the application pod
if !experimentsDetails.IsTargetContainerProvided {
experimentsDetails.TargetContainer = pod.Spec.Containers[0].Name
}
runID := stringutils.GetRunID()
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"Target Pod": pod.Name,
@ -83,34 +89,16 @@ func runChaos(experimentsDetails *experimentTypes.ExperimentDetails, targetPodLi
"Target Container": experimentsDetails.TargetContainer,
})
if err := createHelperPod(experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID, labelSuffix); err != nil {
return errors.Errorf("unable to create the helper pod, err: %v", err)
if err := createHelperPod(ctx, experimentsDetails, clients, chaosDetails, pod.Name, pod.Spec.NodeName, runID); err != nil {
return stacktrace.Propagate(err, "could not create helper pod")
}
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
appLabel := "name=" + experimentsDetails.ExperimentName + "-helper-" + runID
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return errors.Errorf("helper pod is not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+runID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return errors.Errorf("unable to delete the helper pod, err: %v", err)
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
@ -118,7 +106,10 @@ func runChaos(experimentsDetails *experimentTypes.ExperimentDetails, targetPodLi
}
//PrepareChaos contains the preparation steps before chaos injection
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "Prepare[name-your-chaos]Fault")
// defer span.End()
//Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
@ -126,8 +117,8 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
common.WaitForDuration(experimentsDetails.RampTime)
}
//Starting the CPU stress experiment
if err := experimentExecution(experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails);err != nil {
return err
if err := experimentExecution(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails);err != nil {
return stacktrace.Propagate(err, "could not execute chaos")
}
//Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
@ -138,13 +129,16 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
}
// createHelperPod derive the attributes for helper pod and create the helper pod
func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, appName, appNodeName, runID, labelSuffix string) error {
func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, chaosDetails *types.ChaosDetails, targets, nodeName, runID string) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "Create[name-your-chaos]FaultHelperPod")
// defer span.End()
helperPod := &corev1.Pod{
ObjectMeta: v1.ObjectMeta{
Name: experimentsDetails.ExperimentName + "-helper-" + runID,
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, labelSuffix, experimentsDetails.ExperimentName),
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: corev1.PodSpec{
@ -172,5 +166,8 @@ func createHelperPod(experimentsDetails *experimentTypes.ExperimentDetails, clie
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
return err
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}

View File

@ -1,6 +1,7 @@
package lib
import (
"context"
"os"
"os/signal"
"strings"
@ -14,7 +15,6 @@ import (
experimentTypes "github.com/litmuschaos/litmus-go/pkg/{{ .Category }}/{{ .Name }}/types"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
)
var (
@ -22,8 +22,11 @@ var (
inject, abort chan os.Signal
)
//PrepareChaos contains the prepration and injection steps for the experiment
func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
//PrepareChaos contains the preparation and injection steps for the experiment
func PrepareChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "Prepare[name-your-chaos]Fault")
// defer span.End()
// inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
@ -46,7 +49,7 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
// THIS TEMPLATE CONTAINS THE SELECTION BY ID FOR TAG YOU NEED TO ADD/CALL A FUNCTION HERE
targetIDList := strings.Split(experimentsDetails.TargetID, ",")
if experimentsDetails.TargetID == "" {
return errors.Errorf("no target id found to perform chaos on")
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no target id found"}
}
// watching for the abort signal and revert the chaos
@ -54,15 +57,15 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(experimentsDetails, targetIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInSerialMode(ctx, experimentsDetails, targetIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(experimentsDetails, targetIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return err
if err = injectChaosInParallelMode(ctx, experimentsDetails, targetIDList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return errors.Errorf("%v sequence is not supported", experimentsDetails.Sequence)
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
//Waiting for the ramp time after chaos injection
@ -74,7 +77,10 @@ func PrepareChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients
}
//injectChaosInSerialMode will inject the chaos on the target one after other
func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetails, targetIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInSerialMode(ctx context.Contxt, experimentsDetails *experimentTypes.ExperimentDetails, targetIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "Inject[name-your-chaos]FaultInSerialMode")
// defer span.End()
select {
case <-inject:
@ -112,7 +118,7 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
// The OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}
@ -137,7 +143,10 @@ func injectChaosInSerialMode(experimentsDetails *experimentTypes.ExperimentDetai
}
// injectChaosInParallelMode will inject the chaos on the target all at once
func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDetails, targetIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetIDList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
// @TODO: setup tracing
// ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "Inject[name-your-chaos]FaultInParallelMode")
// defer span.End()
select {
case <-inject:
@ -178,7 +187,7 @@ func injectChaosInParallelMode(experimentsDetails *experimentTypes.ExperimentDet
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return err
}
}

View File

@ -20,7 +20,6 @@ func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosLib = types.Getenv("LIB", "litmus")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")

View File

@ -20,7 +20,6 @@ func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosLib = types.Getenv("LIB", "litmus")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")

View File

@ -20,7 +20,6 @@ func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosLib = types.Getenv("LIB", "litmus")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.AppLabel = types.Getenv("APP_LABEL", "")
experimentDetails.AppKind = types.Getenv("APP_KIND", "")

View File

@ -20,7 +20,6 @@ func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosLib = types.Getenv("LIB", "litmus")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")

View File

@ -20,7 +20,6 @@ func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosLib = types.Getenv("LIB", "litmus")
experimentDetails.AppNS = types.Getenv("APP_NAMESPACE", "")
experimentDetails.AppLabel = types.Getenv("APP_LABEL", "")
experimentDetails.AppKind = types.Getenv("APP_KIND", "")

View File

@ -20,7 +20,6 @@ func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "10"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosLib = types.Getenv("LIB", "litmus")
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")

View File

@ -1,11 +1,11 @@
package experiment
import (
"context"
"os"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/{{ .Name }}/lib"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/{{ .Name }}/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
aws "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2"
"github.com/litmuschaos/litmus-go/pkg/events"
@ -20,7 +20,7 @@ import (
)
// Experiment contains steps to inject chaos
func Experiment(clients clients.ClientSets){
func Experiment(ctx context.Context, clients clients.ClientSets){
experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{}
@ -38,19 +38,18 @@ func Experiment(clients clients.ClientSets){
types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" {
// Initialize the probe details. Bail out upon error, as we haven't entered exp business logic yet
if err := probe.InitializeProbesInChaosResultDetails(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}
}
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}
}
//Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err)
failStep := "[pre-chaos]: Failed to update the chaos result of pod-delete experiment (SOT), err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
@ -80,8 +79,7 @@ func Experiment(clients clients.ClientSets){
log.Info("[Status]: Verify that the aws ec2 instances are in running state (pre-chaos)")
if err := aws.InstanceStatusCheckByID(experimentsDetails.TargetID, experimentsDetails.Region); err != nil {
log.Errorf("failed to get the ec2 instance status, err: %v", err)
failStep := "[pre-chaos]: Failed to verify the AWS ec2 instance status, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Status]: EC2 instance is in running state")
@ -93,13 +91,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
log.Errorf("Probe Failed, err: %v", err)
failStep := "[pre-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
@ -113,25 +110,17 @@ func Experiment(clients clients.ClientSets){
// THE BUSINESS LOGIC OF THE ACTUAL CHAOS
// IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE
// @TODO: user INVOKE-CHAOSLIB
// Including the litmus lib
switch experimentsDetails.ChaosLib {
case "litmus":
if err := litmusLIB.PrepareChaos(&experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
default:
log.Error("[Invalid]: Please Provide the correct LIB")
failStep := "[chaos]: no match found for specified lib"
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
chaosDetails.Phase = types.ChaosInjectPhase
if err := litmusLIB.PrepareChaos(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
// @TODO: user POST-CHAOS-CHECK
// ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE
@ -142,8 +131,7 @@ func Experiment(clients clients.ClientSets){
log.Info("[Status]: Verify that the aws ec2 instances are in running state (post-chaos)")
if err := aws.InstanceStatusCheckByID(experimentsDetails.TargetID, experimentsDetails.Region); err != nil {
log.Errorf("failed to get the ec2 instance status, err: %v", err)
failStep := "[post-chaos]: Failed to verify the AWS ec2 instance status, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Status]: EC2 instance is in running state (post chaos)")
@ -155,13 +143,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
log.Errorf("Probes Failed, err: %v", err)
failStep := "[post-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
@ -177,17 +164,13 @@ func Experiment(clients clients.ClientSets){
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// generating the event in chaosresult to marked the verdict as pass/fail
// generating the event in chaosresult to mark the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason := types.PassVerdict
eventType := "Normal"
if resultDetails.Verdict != "Pass" {
reason = types.FailVerdict
eventType = "Warning"
}
reason, eventType := types.GetChaosResultVerdictEvent(resultDetails.Verdict)
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")

View File

@ -1,6 +1,7 @@
package experiment
import (
"context"
"os"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
@ -20,7 +21,7 @@ import (
)
// Experiment contains steps to inject chaos
func Experiment(clients clients.ClientSets){
func Experiment(ctx context.Context, clients clients.ClientSets){
experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{}
@ -39,19 +40,18 @@ func Experiment(clients clients.ClientSets){
types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" {
// Initialize the probe details. Bail out upon error, as we haven't entered exp business logic yet
if err := probe.InitializeProbesInChaosResultDetails(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}
}
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}
}
//Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err)
failStep := "[pre-chaos]: Failed to update the chaos result of pod-delete experiment (SOT), err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
@ -77,8 +77,7 @@ func Experiment(clients clients.ClientSets){
// Setting up Azure Subscription ID
if experimentsDetails.SubscriptionID, err = azureCommon.GetSubscriptionID(); err != nil {
log.Errorf("fail to get the subscription id, err: %v", err)
failStep := "[pre-chaos]: Failed to get the subscription ID for authentication, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
@ -89,8 +88,7 @@ func Experiment(clients clients.ClientSets){
//Verify the azure target instance is running (pre-chaos)
if err := azureStatus.InstanceStatusCheckByName(experimentsDetails.TargetID, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup); err != nil {
log.Errorf("failed to get the azure instance status, err: %v", err)
failStep := "[pre-chaos]: Failed to verify the azure instance status, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Status]: Azure instance(s) is in running state (pre-chaos)")
@ -102,13 +100,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
log.Errorf("Probe Failed, err: %v", err)
failStep := "[pre-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
@ -122,25 +119,17 @@ func Experiment(clients clients.ClientSets){
// THE BUSINESS LOGIC OF THE ACTUAL CHAOS
// IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE
// @TODO: user INVOKE-CHAOSLIB
// Including the litmus lib
switch experimentsDetails.ChaosLib {
case "litmus":
if err := litmusLIB.PrepareChaos(&experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
default:
log.Error("[Invalid]: Please Provide the correct LIB")
failStep := "[chaos]: no match found for specified lib"
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
chaosDetails.Phase = types.ChaosInjectPhase
if err := litmusLIB.PrepareChaos(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
// @TODO: user POST-CHAOS-CHECK
// ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE
@ -149,8 +138,7 @@ func Experiment(clients clients.ClientSets){
//Verify the azure instance is running (post chaos)
if err := azureStatus.InstanceStatusCheckByName(experimentsDetails.TargetID, experimentsDetails.ScaleSet, experimentsDetails.SubscriptionID, experimentsDetails.ResourceGroup); err != nil {
log.Errorf("failed to get the azure instance status, err: %v", err)
failStep := "[pre-chaos]: Failed to update the azure instance status, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Status]: Azure instance is in running state (post chaos)")
@ -161,13 +149,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
log.Errorf("Probes Failed, err: %v", err)
failStep := "[post-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
@ -183,17 +170,13 @@ func Experiment(clients clients.ClientSets){
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// generating the event in chaosresult to marked the verdict as pass/fail
// generating the event in chaosresult to mark the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason := types.PassVerdict
eventType := "Normal"
if resultDetails.Verdict != "Pass" {
reason = types.FailVerdict
eventType = "Warning"
}
reason, eventType := types.GetChaosResultVerdictEvent(resultDetails.Verdict)
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")

View File

@ -43,9 +43,6 @@ spec:
- name: CHAOS_INTERVAL
value: ''
- name: LIB
value: ''
- name: RAMP_TIME
value: ''

View File

@ -1,6 +1,7 @@
package experiment
import (
"context"
"os"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
@ -37,19 +38,18 @@ func Experiment(clients clients.ClientSets){
types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" {
// Initialize the probe details. Bail out upon error, as we haven't entered exp business logic yet
if err := probe.InitializeProbesInChaosResultDetails(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}
}
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}
}
//Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err)
failStep := "[pre-chaos]: Failed to update the chaos result of pod-delete experiment (SOT), err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
@ -75,16 +75,14 @@ func Experiment(clients clients.ClientSets){
computeService, err := gcp.GetGCPComputeService()
if err != nil {
log.Errorf("failed to obtain a gcp compute service, err: %v", err)
failStep := "[pre-chaos]: Failed to obtain a gcp compute service, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// Verify that the GCP VM instance(s) is in RUNNING state (pre-chaos)
if err := gcp.InstanceStatusCheckByName(computeService, experimentsDetails.ManagedInstanceGroup, experimentsDetails.Delay, experimentsDetails.Timeout, "pre-chaos", experimentsDetails.TargetID, experimentsDetails.GCPProjectID, experimentsDetails.InstanceZone); err != nil {
log.Errorf("failed to get the vm instance status, err: %v", err)
failStep := "[pre-chaos]: Failed to verify the GCP VM instance status, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
@ -101,13 +99,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
log.Errorf("Probe Failed, err: %v", err)
failStep := "[pre-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
@ -121,25 +118,18 @@ func Experiment(clients clients.ClientSets){
// THE BUSINESS LOGIC OF THE ACTUAL CHAOS
// IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE
// @TODO: user INVOKE-CHAOSLIB
// Including the litmus lib
switch experimentsDetails.ChaosLib {
case "litmus":
if err := litmusLIB.PrepareChaos(&experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
default:
log.Error("[Invalid]: Please Provide the correct LIB")
failStep := "[chaos]: no match found for specified lib"
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
chaosDetails.Phase = types.ChaosInjectPhase
if err := litmusLIB.PrepareChaos(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
// @TODO: user POST-CHAOS-CHECK
// ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE
@ -148,8 +138,7 @@ func Experiment(clients clients.ClientSets){
//Verify the GCP VM instance is in RUNNING status (post-chaos)
if err := gcp.InstanceStatusCheckByName(computeService, experimentsDetails.ManagedInstanceGroup, experimentsDetails.Delay, experimentsDetails.Timeout, "post-chaos", experimentsDetails.TargetID, experimentsDetails.GCPProjectID, experimentsDetails.InstanceZone); err != nil {
log.Errorf("failed to get the vm instance status, err: %v", err)
failStep := "[post-chaos]: Failed to verify the GCP VM instance status, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
@ -161,13 +150,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
log.Errorf("Probes Failed, err: %v", err)
failStep := "[post-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
@ -183,17 +171,13 @@ func Experiment(clients clients.ClientSets){
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// generating the event in chaosresult to marked the verdict as pass/fail
// generating the event in chaosresult to mark the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason := types.PassVerdict
eventType := "Normal"
if resultDetails.Verdict != "Pass" {
reason = types.FailVerdict
eventType = "Warning"
}
reason, eventType := types.GetChaosResultVerdictEvent(resultDetails.Verdict)
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")

View File

@ -1,6 +1,7 @@
package experiment
import (
"context"
"os"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
@ -19,7 +20,7 @@ import (
)
// Experiment contains steps to inject chaos
func Experiment(clients clients.ClientSets){
func Experiment(ctx context.Context, clients clients.ClientSets){
experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{}
@ -37,19 +38,18 @@ func Experiment(clients clients.ClientSets){
types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" {
// Initialize the probe details. Bail out upon error, as we haven't entered exp business logic yet
if err := probe.InitializeProbesInChaosResultDetails(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}
}
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}
}
//Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err)
failStep := "[pre-chaos]: Failed to update the chaos result of pod-delete experiment (SOT), err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
@ -78,25 +78,23 @@ func Experiment(clients clients.ClientSets){
//PRE-CHAOS APPLICATION STATUS CHECK
if chaosDetails.DefaultHealthCheck {
log.Info("[Status]: Verify that the AUT (Application Under Test) is running (pre-chaos)")
if err := status.AUTStatusCheck(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.TargetContainer, experimentsDetails.Timeout, experimentsDetails.Delay, clients, &chaosDetails); err != nil {
log.Errorf("Application status check failed, err: %v", err)
failStep := "[pre-chaos]: Failed to verify that the AUT (Application Under Test) is in running state, err: " + err.Error()
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, "AUT: Not Running", "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
if err := status.AUTStatusCheck(clients, &chaosDetails); err != nil {
log.Errorf("Application status check failed, err: %v", err)
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, "AUT: Not Running", "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
}
{{ if eq .AuxiliaryAppCheck true }}
//PRE-CHAOS AUXILIARY APPLICATION STATUS CHECK
if experimentsDetails.AuxiliaryAppInfo != "" {
log.Info("[Status]: Verify that the Auxiliary Applications are running (pre-chaos)")
if err := status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients);err != nil {
log.Errorf("Auxiliary Application status check failed, err: %v", err)
failStep := "[pre-chaos]: Failed to verify that the Auxiliary Applications are in running state, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
log.Info("[Status]: Verify that the Auxiliary Applications are running (pre-chaos)")
if err := status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
log.Errorf("Auxiliary Application status check failed, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
}
{{- end }}
@ -107,13 +105,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
log.Errorf("Probe Failed, err: %v", err)
failStep := "[pre-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
@ -128,24 +125,16 @@ func Experiment(clients clients.ClientSets){
// IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE
// @TODO: user INVOKE-CHAOSLIB
// Including the litmus lib
switch experimentsDetails.ChaosLib {
case "litmus":
if err := litmusLIB.PrepareChaos(&experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
default:
log.Error("[Invalid]: Please Provide the correct LIB")
failStep := "[chaos]: no match found for specified lib"
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
chaosDetails.Phase = types.ChaosInjectPhase
if err := litmusLIB.PrepareChaos(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
// @TODO: user POST-CHAOS-CHECK
// ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE
@ -154,12 +143,11 @@ func Experiment(clients clients.ClientSets){
//POST-CHAOS APPLICATION STATUS CHECK
if chaosDetails.DefaultHealthCheck {
log.Info("[Status]: Verify that the AUT (Application Under Test) is running (post-chaos)")
if err := status.AUTStatusCheck(experimentsDetails.AppNS, experimentsDetails.AppLabel, experimentsDetails.TargetContainer, experimentsDetails.Timeout, experimentsDetails.Delay, clients, &chaosDetails); err != nil {
if err := status.AUTStatusCheck(clients, &chaosDetails); err != nil {
log.Errorf("Application status check failed, err: %v", err)
failStep := "[post-chaos]: Failed to verify that the AUT (Application Under Test) is running, err: " + err.Error()
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, "AUT: Not Running", "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
}
@ -167,10 +155,9 @@ func Experiment(clients clients.ClientSets){
//POST-CHAOS AUXILIARY APPLICATION STATUS CHECK
if experimentsDetails.AuxiliaryAppInfo != "" {
log.Info("[Status]: Verify that the Auxiliary Applications are running (post-chaos)")
if err := status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients);err != nil {
if err := status.CheckAuxiliaryApplicationStatus(experimentsDetails.AuxiliaryAppInfo, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
log.Errorf("Auxiliary Application status check failed, err: %v", err)
failStep := "[post-chaos]: Failed to verify that the Auxiliary Applications are running, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
}
@ -182,13 +169,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
log.Errorf("Probes Failed, err: %v", err)
failStep := "[post-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
@ -204,17 +190,13 @@ func Experiment(clients clients.ClientSets){
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// generating the event in chaosresult to marked the verdict as pass/fail
// generating the event in chaosresult to mark the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason := types.PassVerdict
eventType := "Normal"
if resultDetails.Verdict != "Pass" {
reason = types.FailVerdict
eventType = "Warning"
}
reason, eventType := types.GetChaosResultVerdictEvent(resultDetails.Verdict)
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")

View File

@ -45,11 +45,6 @@ spec:
- name: RAMP_TIME
value: ''
## env var that describes the library used to execute the chaos
## default: litmus. Supported values: litmus, powerfulseal, chaoskube
- name: LIB
value: ''
# provide the chaos namespace
- name: CHAOS_NAMESPACE
value: ''

View File

@ -1,8 +1,10 @@
package experiment
import (
"context"
"os"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/{{ .Name }}/lib"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
@ -19,7 +21,7 @@ import (
)
// Experiment contains steps to inject chaos
func Experiment(clients clients.ClientSets){
func Experiment(ctx context.Context, clients clients.ClientSets){
experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{}
@ -48,8 +50,7 @@ func Experiment(clients clients.ClientSets){
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT");err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err)
failStep := "[pre-chaos]: Failed to update the chaos result of pod-delete experiment (SOT), err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
@ -74,8 +75,7 @@ func Experiment(clients clients.ClientSets){
// GET SESSION ID TO LOGIN TO VCENTER
cookie, err := vmware.GetVcenterSessionID(experimentsDetails.VcenterServer, experimentsDetails.VcenterUser, experimentsDetails.VcenterPass)
if err != nil {
failStep := "[pre-chaos]: Failed to obtain the Vcenter session ID, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
log.Errorf("Vcenter Login failed, err: %v", err)
return
}
@ -87,8 +87,7 @@ func Experiment(clients clients.ClientSets){
// PRE-CHAOS VM STATUS CHECK
if err := vmware.VMStatusCheck(experimentsDetails.VcenterServer, experimentsDetails.TargetID, cookie); err != nil {
log.Errorf("Failed to get the VM status, err: %v", err)
failStep := "[pre-chaos]: Failed to verify the VM status, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Verification]: VMs are in running state (pre-chaos)")
@ -100,13 +99,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails);err != nil {
log.Errorf("Probe Failed, err: %v", err)
failStep := "[pre-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
@ -120,25 +118,18 @@ func Experiment(clients clients.ClientSets){
// THE BUSINESS LOGIC OF THE ACTUAL CHAOS
// IT CAN BE A NEW CHAOSLIB YOU HAVE CREATED SPECIALLY FOR THIS EXPERIMENT OR ANY EXISTING ONE
// @TODO: user INVOKE-CHAOSLIB
// Including the litmus lib
switch experimentsDetails.ChaosLib {
case "litmus":
if err := litmusLIB.PrepareChaos(&experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
default:
log.Error("[Invalid]: Please Provide the correct LIB")
failStep := "[chaos]: no match found for specified lib"
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
chaosDetails.Phase = types.ChaosInjectPhase
if err := litmusLIB.PrepareChaos(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
// @TODO: user POST-CHAOS-CHECK
// ADD A POST-CHAOS CHECK OF YOUR CHOICE HERE
@ -148,8 +139,7 @@ func Experiment(clients clients.ClientSets){
log.Info("[Status]: Verify that the IUT (Instance Under Test) is running (post-chaos)")
if err := vmware.VMStatusCheck(experimentsDetails.VcenterServer, experimentsDetails.TargetID, cookie); err != nil {
log.Errorf("Failed to get the VM status, err: %v", err)
failStep := "[post-chaos]: Failed to get the VM status, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Verification]: VMs are in running state (post-chaos)")
@ -160,13 +150,12 @@ func Experiment(clients clients.ClientSets){
// run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails);err != nil {
log.Errorf("Probes Failed, err: %v", err)
failStep := "[post-chaos]: Failed while running probes, err: " + err.Error()
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
@ -182,6 +171,7 @@ func Experiment(clients clients.ClientSets){
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT");err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}

View File

@ -5,7 +5,7 @@ import (
)
// ADD THE ATTRIBUTES OF YOUR CHOICE HERE
// FEW MENDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// FEW MANDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// ExperimentDetails is for collecting all the experiment-related details
type ExperimentDetails struct {
@ -14,7 +14,6 @@ type ExperimentDetails struct {
ChaosDuration int
ChaosInterval int
RampTime int
ChaosLib string
ChaosUID clientTypes.UID
InstanceID string
ChaosNamespace string

View File

@ -5,7 +5,7 @@ import (
)
// ADD THE ATTRIBUTES OF YOUR CHOICE HERE
// FEW MENDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// FEW MANDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// ExperimentDetails is for collecting all the experiment-related details
type ExperimentDetails struct {
@ -14,7 +14,6 @@ type ExperimentDetails struct {
ChaosDuration int
ChaosInterval int
RampTime int
ChaosLib string
ChaosUID clientTypes.UID
InstanceID string
ChaosNamespace string

View File

@ -5,7 +5,7 @@ import (
)
// ADD THE ATTRIBUTES OF YOUR CHOICE HERE
// FEW MENDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// FEW MANDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// ExperimentDetails is for collecting all the experiment-related details
type ExperimentDetails struct {
@ -14,7 +14,6 @@ type ExperimentDetails struct {
ChaosDuration int
ChaosInterval int
RampTime int
ChaosLib string
AppNS string
AppLabel string
AppKind string
@ -31,4 +30,5 @@ type ExperimentDetails struct {
PodsAffectedPerc int
TargetPods string
LIBImagePullPolicy string
IsTargetContainerProvided bool
}

View File

@ -5,7 +5,7 @@ import (
)
// ADD THE ATTRIBUTES OF YOUR CHOICE HERE
// FEW MENDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// FEW MANDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// ExperimentDetails is for collecting all the experiment-related details
type ExperimentDetails struct {
@ -14,7 +14,6 @@ type ExperimentDetails struct {
ChaosDuration int
ChaosInterval int
RampTime int
ChaosLib string
ChaosUID clientTypes.UID
InstanceID string
ChaosNamespace string

View File

@ -5,7 +5,7 @@ import (
)
// ADD THE ATTRIBUTES OF YOUR CHOICE HERE
// FEW MENDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// FEW MANDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// ExperimentDetails is for collecting all the experiment-related details
type ExperimentDetails struct {
@ -14,7 +14,6 @@ type ExperimentDetails struct {
ChaosDuration int
ChaosInterval int
RampTime int
ChaosLib string
AppNS string
AppLabel string
AppKind string
@ -32,4 +31,5 @@ type ExperimentDetails struct {
LIBImage string
SetHelperData string
ChaosServiceAccount string
IsTargetContainerProvided bool
}

View File

@ -5,7 +5,7 @@ import (
)
// ADD THE ATTRIBUTES OF YOUR CHOICE HERE
// FEW MENDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// FEW MANDATORY ATTRIBUTES ARE ADDED BY DEFAULT
// ExperimentDetails is for collecting all the experiment-related details
type ExperimentDetails struct {
@ -14,7 +14,6 @@ type ExperimentDetails struct {
ChaosDuration int
ChaosInterval int
RampTime int
ChaosLib string
ChaosUID clientTypes.UID
InstanceID string
ChaosNamespace string

View File

@ -1,13 +1,14 @@
package experiment
import (
"context"
"os"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/aws-ssm-chaos/lib/ssm"
experimentEnv "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/environment"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/clients"
ec2 "github.com/litmuschaos/litmus-go/pkg/cloud/aws/ec2"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/ssm"
"github.com/litmuschaos/litmus-go/pkg/events"
@ -20,7 +21,7 @@ import (
)
// AWSSSMChaosByID inject the ssm chaos on ec2 instance
func AWSSSMChaosByID(clients clients.ClientSets) {
func AWSSSMChaosByID(ctx context.Context, clients clients.ClientSets) {
experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{}
@ -38,9 +39,9 @@ func AWSSSMChaosByID(clients clients.ClientSets) {
types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" {
// Initialize the probe details. Bail out upon error, as we haven't entered exp business logic yet
if err := probe.InitializeProbesInChaosResultDetails(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes: %v", err)
return
}
}
@ -48,9 +49,8 @@ func AWSSSMChaosByID(clients clients.ClientSets) {
//Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT"); err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err)
failStep := "[pre-chaos]: Failed to update the chaos result of ec2 terminate experiment (SOT), err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
log.Errorf("Unable to create the chaosresult: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
@ -60,8 +60,9 @@ func AWSSSMChaosByID(clients clients.ClientSets) {
// generating the event in chaosresult to marked the verdict as awaited
msg := "experiment: " + experimentsDetails.ExperimentName + ", Result: Awaited"
types.SetResultEventAttributes(&eventsDetails, types.AwaitedVerdict, msg, "Normal", &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")
if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosresult", types.AwaitedVerdict)
}
// Calling AbortWatcher go routine, it will continuously watch for the abort signal and generate the required events and result
go common.AbortWatcherWithoutExit(experimentsDetails.ExperimentName, clients, &resultDetails, &chaosDetails, &eventsDetails)
@ -80,73 +81,67 @@ func AWSSSMChaosByID(clients clients.ClientSets) {
// run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails); err != nil {
log.Errorf("Probe Failed, err: %v", err)
failStep := "[pre-chaos]: Failed while running probes, err: " + err.Error()
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails); err != nil {
log.Errorf("Probe Failed: %v", err)
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosengine", types.PreChaosCheck)
}
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
}
// generating the events for the pre-chaos check
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosengine", types.PreChaosCheck)
}
}
//Verify that the instance should have permission to perform ssm api calls
if err := ssm.CheckInstanceInformation(&experimentsDetails); err != nil {
log.Errorf("failed perform ssm api calls, err: %v", err)
failStep := "[pre-chaos]: Failed to verify to make SSM api calls, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
log.Errorf("Failed perform ssm api calls: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
if chaosDetails.DefaultHealthCheck {
//Verify the aws ec2 instance is running (pre chaos)
if err := ec2.InstanceStatusCheckByID(experimentsDetails.EC2InstanceID, experimentsDetails.Region); err != nil {
log.Errorf("failed to get the ec2 instance status, err: %v", err)
failStep := "[pre-chaos]: Failed to verify the AWS ec2 instance status, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
log.Errorf("Failed to get the ec2 instance status: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Status]: EC2 instance is in running state")
}
// Including the litmus lib for aws-ssm-chaos-by-id
switch experimentsDetails.ChaosLib {
case "litmus":
if err := litmusLIB.PrepareAWSSSMChaosByID(&experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
failStep := "[chaos]: Failed inside the chaoslib, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
//Delete the ssm document on the given aws service monitoring docs
if experimentsDetails.IsDocsUploaded {
log.Info("[Recovery]: Delete the uploaded aws ssm docs")
if err := ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region); err != nil {
log.Errorf("fail to delete ssm doc, err: %v", err)
}
chaosDetails.Phase = types.ChaosInjectPhase
if err := litmusLIB.PrepareAWSSSMChaosByID(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
//Delete the ssm document on the given aws service monitoring docs
if experimentsDetails.IsDocsUploaded {
log.Info("[Recovery]: Delete the uploaded aws ssm docs")
if err := ssm.SSMDeleteDocument(experimentsDetails.DocumentName, experimentsDetails.Region); err != nil {
log.Errorf("Failed to delete ssm doc: %v", err)
}
return
}
default:
log.Error("[Invalid]: Please Provide the correct LIB")
failStep := "[chaos]: no match was found for the specified lib"
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
if chaosDetails.DefaultHealthCheck {
//Verify the aws ec2 instance is running (post chaos)
if err := ec2.InstanceStatusCheckByID(experimentsDetails.EC2InstanceID, experimentsDetails.Region); err != nil {
log.Errorf("failed to get the ec2 instance status, err: %v", err)
failStep := "[post-chaos]: Failed to verify the AWS ec2 instance status, err: " + err.Error()
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
log.Errorf("Failed to get the ec2 instance status: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Status]: EC2 instance is in running state (post chaos)")
@ -158,13 +153,14 @@ func AWSSSMChaosByID(clients clients.ClientSets) {
// run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(&chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails); err != nil {
log.Errorf("Probes Failed, err: %v", err)
failStep := "[post-chaos]: Failed while running probes, err: " + err.Error()
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails); err != nil {
log.Errorf("Probes Failed: %v", err)
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, failStep, clients, &eventsDetails)
if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosengine", types.PostChaosCheck)
}
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
@ -172,31 +168,30 @@ func AWSSSMChaosByID(clients clients.ClientSets) {
// generating post chaos event
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosengine", types.PostChaosCheck)
}
}
//Updating the chaosResult in the end of experiment
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT"); err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err)
log.Errorf("Unable to update the chaosresult: %v", err)
return
}
// generating the event in chaosresult to marked the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason := types.PassVerdict
eventType := "Normal"
if resultDetails.Verdict != "Pass" {
reason = types.FailVerdict
eventType = "Warning"
}
reason, eventType := types.GetChaosResultVerdictEvent(resultDetails.Verdict)
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")
if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosresult", reason)
}
if experimentsDetails.EngineName != "" {
msg := experimentsDetails.ExperimentName + " experiment has been " + string(resultDetails.Verdict) + "ed"
types.SetEngineEventAttributes(&eventsDetails, types.Summary, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosengine", types.Summary)
}
}
}

Some files were not shown because too many files have changed in this diff Show More