The bridge between chaos operator and chaos experiment! Lifecycle manager for chaos experiments
Go to file
Namkyu Park 173b5ff688
feat: implement otel telemetry sdk for distributed tracing (#221)
* feat: implement otel sdk

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: update endpoint

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: fix context logic

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* fix: make otel optional

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>

* (chore): Fix the release pipeline (#224)

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>

* fix: add logs

Signed-off-by: namkyu1999 <lak9348@gmail.com>

* chore

Signed-off-by: namkyu1999 <lak9348@gmail.com>

* feat: go version from 1.20 to 1.22

Signed-off-by: namkyu1999 <lak9348@gmail.com>

---------

Signed-off-by: namkyu1999 <lak9348@konkuk.ac.kr>
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: namkyu1999 <lak9348@gmail.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-04-02 11:50:02 +05:30
.github feat: implement otel telemetry sdk for distributed tracing (#221) 2025-04-02 11:50:02 +05:30
bin feat: implement otel telemetry sdk for distributed tracing (#221) 2025-04-02 11:50:02 +05:30
build feat: migration base image 2024-07-16 10:48:40 +09:00
pkg feat: implement otel telemetry sdk for distributed tracing (#221) 2025-04-02 11:50:02 +05:30
tests fix(logs): fix the rank warning in logs (#220) 2024-06-01 12:20:37 +05:30
.gitignore chore(resourceRequirements): Adding resource requirements in chaos pod (#93) 2020-09-29 19:06:43 +05:30
CONTRIBUTING.md chore(charts): update readme, contributor guide and github actions 2021-10-11 13:50:07 +05:30
LICENSE chore(charts): update readme, contributor guide and github actions 2021-10-11 13:50:07 +05:30
Makefile fixing the docker buildx progess argument (#202) 2023-06-20 11:54:01 +05:30
NOTICE.md (chore)notice: add license notice (#51) 2020-03-13 22:44:38 +05:30
README.md Remove BCH banner from readme 2024-04-26 14:25:22 +05:30
go.mod feat: implement otel telemetry sdk for distributed tracing (#221) 2025-04-02 11:50:02 +05:30
go.sum feat: implement otel telemetry sdk for distributed tracing (#221) 2025-04-02 11:50:02 +05:30

README.md

CHAOS RUNNER

Slack Channel GitHub Workflow Docker Pulls GitHub issues Twitter Follow CII Best Practices Go Report Card FOSSA Status YouTube Channel

The chaos Runner is an operational bridge between the Chaos-Operator and the LitmusChaos experiment jobs.

  • It is launched as a pod in the chaos namespace(where chaosengine is running) & reconciled by the Litmus Chaos Operator
  • Reads the chaos parameters from the experiment CR & overrides with values from the ChaosEngine, constructs the experiment job after validating dependencies such as configmap/secret volumes & launches it (along with the monitor/chaos-exporter deployment if engine's monitoring policy is true)
  • Monitors the experiment pod until completion
  • Cleans up the experiment job post completion based on the engine's jobCleanUpPolicy (delete or retain)
  • Patches the ChaosEngine with the verdict of the experiment and creates the events for the different phases inside chaosengine.

Objective behind chaos-runner creation:

  • Support a contextual/audit logging framework in litmus where the sequence of events from creation of the engine to its eventual removal (with the experiment execution summary in b/w) is traceable

  • Support termination/abort of experiments in progress, Removal of all chaos residue with single operation etc., One of the ways to achieve this, is to ensure that the OwnerReference of the ChaosEngine is passed to the experiment jobs (which can be arguably termed the child resources along with the runner itself) to allow the garbage collection to take care of the deletePropagation.

  • Create and/or mount volume (configmaps, secrets) resources with validation for availability of these resources.

  • Support dependency management of experiments in case of batch runs with possible parallel / asynchronous execution & thereby patching of the ChaosEngine.

  • Allow multiple combinations of random execution in case of future support for Chaos Scheduling, where it may be necessary for the job execution to be randomized based on different conditions (iteration count, minimum intervals etc.,)

Further Improvements

  • The Go Chaos Runner is in beta stage with further improvements coming soon!!

How to get started?

Refer the LitmusChaos documentation and Experiment Documentation

How do I contribute?

You can contribute by raising issues, improving the documentation, contributing to the core framework and tooling, etc.

Head over to the Contribution guide

License

FOSSA Status