Compare commits

...

11 Commits

Author SHA1 Message Date
Neelanjan Manna e7b4e7dbe4
chore: adds retries with timeout for litmus and k8s client operations (#766)
* chore: adds retries for k8s api operations

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

* chore: adds retries for litmus api operations

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

---------

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2025-08-14 15:41:34 +05:30
Neelanjan Manna 62a4986c78
chore: adds common functions for helper pod lifecycle management (#764)
Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2025-08-14 12:18:29 +05:30
Neelanjan Manna d626cf3ec4
Merge pull request #761 from litmuschaos/CHAOS-9404
feat: adds port filtering for ip/hostnames for network faults, adds pod-network-rate-limit fault
2025-08-13 16:40:51 +05:30
neelanjan00 59125424c3
feat: adds ip+port filtering, adds pod-network-rate-limit fault
Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
2025-08-13 16:13:24 +05:30
Neelanjan Manna 2e7ff836fc
feat: Adds multi container support for pod stress faults (#757)
* chore: Fix typo in log statement

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

* chore: adds multi-container stress chaos system with improved lifecycle management and better error handling

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>

---------

Signed-off-by: neelanjan00 <neelanjan.manna@harness.io>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-08-13 16:04:20 +05:30
Prexy e61d5b33be
written test for `workload.go` in `pkg/workloads` (#767)
* written test for workload.go in pkg/workloads

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* checking go formatting

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
2025-08-12 17:30:22 +05:30
Prexy 14fe30c956
test: add unit tests for exec.go file in pkg/utils folder (#755)
* test: add unit tests for exec.go file in pkg/utils folder

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* fixing gofmt

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* creating table driven test and also updates TestCheckPodStatus

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
2025-07-24 15:33:25 +05:30
Prexy 4ae08899e0
test: add unit tests for retry.go in pkg/utils folder (#754)
* test: add unit tests for retry.go in pkg/utils folder

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* fixing gofmt

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
2025-07-24 11:55:42 +05:30
Prexy 2c38220cca
test: add unit tests for RandStringBytesMask and GetRunID in stringutils (#753)
* test: add unit tests for RandStringBytesMask and GetRunID in stringutils

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

* fixing gofmt

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>

---------

Signed-off-by: Prakhar-Shankar <prakharshankar247@gmail.com>
2025-07-24 11:55:26 +05:30
Sami S. 07de11eeee
Fix: handle pagination in ssm describeInstanceInformation & API Rate Limit (#738)
* Fix: handle pagination in ssm describe

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* implement exponential backoff with jitter for API rate limiting

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Refactor

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Update pkg/cloud/aws/ssm/ssm-operations.go

Co-authored-by: Neelanjan Manna <neelanjanmanna@gmail.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fixup

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Update pkg/cloud/aws/ssm/ssm-operations.go

Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Fix: include error message from stderr if container-kill fails (#740) (#741)

Signed-off-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fix(logs): Fix the error logs for container-kill fault (#745)

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fix(container-kill): Fixed the container stop command timeout issue (#747)

Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* feat: Add a rds-instance-stop chaos fault (#710)

* feat: Add a rds-instance-stop chaos fault

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>

---------

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Update pkg/cloud/aws/ssm/ssm-operations.go

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fix go fmt ./...

Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>
Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* Filter instances on api call

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>

* fixes lint

Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>

---------

Signed-off-by: Sami Shabaneh <sami.shabaneh@careem.com>
Signed-off-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
Signed-off-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
Signed-off-by: Udit Gaurav <udit.gaurav@harness.io>
Co-authored-by: Neelanjan Manna <neelanjanmanna@gmail.com>
Co-authored-by: Udit Gaurav <35391335+uditgaurav@users.noreply.github.com>
Co-authored-by: Björn Kylberg <47784470+bjoky@users.noreply.github.com>
Co-authored-by: Shubham Chaudhary <shubham.chaudhary@harness.io>
Co-authored-by: Jongwoo Han <jongwooo.han@gmail.com>
Co-authored-by: Udit Gaurav <udit.gaurav@harness.io>
2025-04-30 10:25:10 +05:30
Jongwoo Han 5c22472290
feat: Add a rds-instance-stop chaos fault (#710)
* feat: Add a rds-instance-stop chaos fault

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>

---------

Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com>
2025-04-24 12:54:05 +05:30
117 changed files with 3083 additions and 944 deletions

View File

@ -15,6 +15,8 @@ import (
// _ "k8s.io/client-go/plugin/pkg/client/auth/oidc"
// _ "k8s.io/client-go/plugin/pkg/client/auth/openstack"
"go.opentelemetry.io/otel"
awsSSMChaosByID "github.com/litmuschaos/litmus-go/experiments/aws-ssm/aws-ssm-chaos-by-id/experiment"
awsSSMChaosByTag "github.com/litmuschaos/litmus-go/experiments/aws-ssm/aws-ssm-chaos-by-tag/experiment"
azureDiskLoss "github.com/litmuschaos/litmus-go/experiments/azure/azure-disk-loss/experiment"
@ -55,11 +57,13 @@ import (
podNetworkLatency "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-latency/experiment"
podNetworkLoss "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-loss/experiment"
podNetworkPartition "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-partition/experiment"
podNetworkRateLimit "github.com/litmuschaos/litmus-go/experiments/generic/pod-network-rate-limit/experiment"
kafkaBrokerPodFailure "github.com/litmuschaos/litmus-go/experiments/kafka/kafka-broker-pod-failure/experiment"
ebsLossByID "github.com/litmuschaos/litmus-go/experiments/kube-aws/ebs-loss-by-id/experiment"
ebsLossByTag "github.com/litmuschaos/litmus-go/experiments/kube-aws/ebs-loss-by-tag/experiment"
ec2TerminateByID "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate-by-id/experiment"
ec2TerminateByTag "github.com/litmuschaos/litmus-go/experiments/kube-aws/ec2-terminate-by-tag/experiment"
rdsInstanceStop "github.com/litmuschaos/litmus-go/experiments/kube-aws/rds-instance-stop/experiment"
k6Loadgen "github.com/litmuschaos/litmus-go/experiments/load/k6-loadgen/experiment"
springBootFaults "github.com/litmuschaos/litmus-go/experiments/spring-boot/spring-boot-faults/experiment"
vmpoweroff "github.com/litmuschaos/litmus-go/experiments/vmware/vm-poweroff/experiment"
@ -67,7 +71,6 @@ import (
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
)
func init() {
@ -153,6 +156,8 @@ func main() {
podNetworkLoss.PodNetworkLoss(ctx, clients)
case "pod-network-partition":
podNetworkPartition.PodNetworkPartition(ctx, clients)
case "pod-network-rate-limit":
podNetworkRateLimit.PodNetworkRateLimit(ctx, clients)
case "pod-memory-hog":
podMemoryHog.PodMemoryHog(ctx, clients)
case "pod-cpu-hog":
@ -171,6 +176,8 @@ func main() {
ebsLossByID.EBSLossByID(ctx, clients)
case "ebs-loss-by-tag":
ebsLossByTag.EBSLossByTag(ctx, clients)
case "rds-instance-stop":
rdsInstanceStop.RDSInstanceStop(ctx, clients)
case "node-restart":
nodeRestart.NodeRestart(ctx, clients)
case "pod-dns-error":

View File

@ -208,10 +208,11 @@ func stopDockerContainer(containerIDs []string, socketPath, signal, source strin
// getRestartCount return the restart count of target container
func getRestartCount(target targetDetails, clients clients.ClientSets) (int, error) {
pod, err := clients.KubeClient.CoreV1().Pods(target.Namespace).Get(context.Background(), target.Name, v1.GetOptions{})
pod, err := clients.GetPod(target.Namespace, target.Name, 180, 2)
if err != nil {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: target.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", target.Name, target.Namespace), Reason: err.Error()}
}
restartCount := 0
for _, container := range pod.Status.ContainerStatuses {
if container.Name == target.TargetContainer {

View File

@ -16,7 +16,6 @@ import (
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/container-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
@ -59,7 +58,7 @@ func PrepareContainerKill(ctx context.Context, experimentsDetails *experimentTyp
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not experiment service account")
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
@ -118,26 +117,8 @@ func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experiment
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, experimentsDetails.ChaosNamespace, true)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pods")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
@ -170,26 +151,8 @@ func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experime
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, experimentsDetails.ChaosNamespace, true)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pods")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
@ -262,10 +225,10 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}

View File

@ -3,10 +3,6 @@ package helper
import (
"context"
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"os"
"os/exec"
"os/signal"
@ -15,6 +11,11 @@ import (
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/disk-fill/types"
@ -197,7 +198,7 @@ func fillDisk(t targetDetails, bs int) error {
// getEphemeralStorageAttributes derive the ephemeral storage attributes from the target pod
func getEphemeralStorageAttributes(t targetDetails, clients clients.ClientSets) (int64, error) {
pod, err := clients.KubeClient.CoreV1().Pods(t.Namespace).Get(context.Background(), t.Name, v1.GetOptions{})
pod, err := clients.GetPod(t.Namespace, t.Name, 180, 2)
if err != nil {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
}
@ -251,7 +252,7 @@ func getSizeToBeFilled(experimentsDetails *experimentTypes.ExperimentDetails, us
// revertDiskFill will delete the target pod if target pod is evicted
// if target pod is still running then it will delete the files, which was created during chaos execution
func revertDiskFill(t targetDetails, clients clients.ClientSets) error {
pod, err := clients.KubeClient.CoreV1().Pods(t.Namespace).Get(context.Background(), t.Name, v1.GetOptions{})
pod, err := clients.GetPod(t.Namespace, t.Name, 180, 2)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s,namespace: %s}", t.Name, t.Namespace), Reason: err.Error()}
}

View File

@ -16,7 +16,6 @@ import (
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/disk-fill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/exec"
@ -64,7 +63,7 @@ func PrepareDiskFill(ctx context.Context, experimentsDetails *experimentTypes.Ex
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not experiment service account")
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
@ -122,26 +121,8 @@ func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experiment
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, true)
}
//Deleting all the helper pod for disk-fill chaos
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
@ -153,7 +134,7 @@ func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experiment
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, targetPodList apiv1.PodList, clients clients.ClientSets, chaosDetails *types.ChaosDetails, execCommandDetails exec.PodDetails, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectDiskFillFaultInParallelMode")
defer span.End()
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
@ -177,26 +158,8 @@ func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experime
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, true)
}
//Deleting all the helper pod for disk-fill chaos
log.Info("[Cleanup]: Deleting all the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
@ -268,10 +231,10 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}

View File

@ -14,8 +14,6 @@ import (
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/docker-service-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
@ -69,40 +67,8 @@ func PrepareDockerServiceKill(ctx context.Context, experimentsDetails *experimen
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err = status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return err
}
}
// Checking for the node to be in not-ready state
log.Info("[Status]: Check for the node to be in NotReady state")
if err = status.CheckNodeNotReadyState(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check for NOT READY state")
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, false)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
//Waiting for the ramp time after chaos injection

View File

@ -16,7 +16,6 @@ import (
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/http-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
@ -52,7 +51,7 @@ func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTy
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not experiment service account")
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
@ -113,26 +112,8 @@ func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experiment
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, true)
}
//Deleting all the helper pod for http chaos
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
@ -167,26 +148,8 @@ func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experime
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, true)
}
// Deleting all the helper pod for http chaos
log.Info("[Cleanup]: Deleting all the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
@ -264,10 +227,10 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}

View File

@ -12,7 +12,6 @@ import (
experimentTypes "github.com/litmuschaos/litmus-go/pkg/load/k6-loadgen/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
@ -48,26 +47,8 @@ func experimentExecution(ctx context.Context, experimentsDetails *experimentType
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, experimentsDetails.ChaosNamespace, true)
}
//Deleting all the helper pod for k6-loadgen chaos
log.Info("[Cleanup]: Deleting all the helper pods")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil

View File

@ -14,7 +14,6 @@ import (
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/kubelet-service-kill/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
@ -70,41 +69,21 @@ func PrepareKubeletKill(ctx context.Context, experimentsDetails *experimentTypes
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err = status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
common.SetTargets(experimentsDetails.TargetNode, "targeted", "node", chaosDetails)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return err
}
if err := common.CheckHelperStatusAndRunProbes(ctx, appLabel, experimentsDetails.TargetNode, chaosDetails, clients, resultDetails, eventsDetails); err != nil {
return err
}
// Checking for the node to be in not-ready state
log.Info("[Status]: Check for the node to be in NotReady state")
if err = status.CheckNodeNotReadyState(experimentsDetails.TargetNode, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
if deleteErr := common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients); deleteErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[err: %v, delete error: %v]", err, deleteErr)}
}
return stacktrace.Propagate(err, "could not check for NOT READY state")
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, false)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod")
if err := common.WaitForCompletionAndDeleteHelperPods(appLabel, chaosDetails, clients, false); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
@ -112,6 +91,7 @@ func PrepareKubeletKill(ctx context.Context, experimentsDetails *experimentTypes
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
@ -204,10 +184,10 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}

View File

@ -3,11 +3,6 @@ package helper
import (
"context"
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"os"
"os/exec"
"os/signal"
@ -16,6 +11,12 @@ import (
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/events"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
@ -83,7 +84,6 @@ func Helper(ctx context.Context, clients clients.ClientSets) {
// preparePodNetworkChaos contains the prepration steps before chaos injection
func preparePodNetworkChaos(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails, resultDetails *types.ResultDetails) error {
targetEnv := os.Getenv("TARGETS")
if targetEnv == "" {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: chaosDetails.ChaosPodName, Reason: "no target found, provide atleast one target"}
@ -109,10 +109,10 @@ func preparePodNetworkChaos(experimentsDetails *experimentTypes.ExperimentDetail
return stacktrace.Propagate(err, "could not get container id")
}
// extract out the pid of the target container
td.Pid, err = common.GetPauseAndSandboxPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
// extract out the network ns path of the pod sandbox or pause container
td.NetworkNsPath, err = common.GetNetworkNsPath(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
return stacktrace.Propagate(err, "could not get container network ns path")
}
targets = append(targets, td)
@ -128,14 +128,17 @@ func preparePodNetworkChaos(experimentsDetails *experimentTypes.ExperimentDetail
default:
}
for _, t := range targets {
for index, t := range targets {
// injecting network chaos inside target container
if err = injectChaos(experimentsDetails.NetworkInterface, t); err != nil {
if revertErr := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, index-1); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not inject chaos")
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if _, revertErr := killnetem(t, experimentsDetails.NetworkInterface); revertErr != nil {
if revertErr := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, index); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
@ -152,18 +155,25 @@ func preparePodNetworkChaos(experimentsDetails *experimentTypes.ExperimentDetail
common.WaitForDuration(experimentsDetails.ChaosDuration)
log.Info("[Chaos]: duration is over, reverting chaos")
log.Info("[Chaos]: Duration is over, reverting chaos")
if err := revertChaosForAllTargets(targets, experimentsDetails.NetworkInterface, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
return nil
}
func revertChaosForAllTargets(targets []targetDetails, networkInterface string, resultDetails *types.ResultDetails, chaosNs string, index int) error {
var errList []string
for _, t := range targets {
// cleaning the netem process after chaos injection
killed, err := killnetem(t, experimentsDetails.NetworkInterface)
for i := 0; i <= index; i++ {
killed, err := killnetem(targets[i], networkInterface)
if !killed && err != nil {
errList = append(errList, err.Error())
continue
}
if killed && err == nil {
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
if err = result.AnnotateChaosResult(resultDetails.Name, chaosNs, "reverted", "pod", targets[i].Name); err != nil {
errList = append(errList, err.Error())
}
}
@ -183,16 +193,15 @@ func injectChaos(netInterface string, target targetDetails) error {
netemCommands := os.Getenv("NETEM_COMMAND")
if len(target.DestinationIps) == 0 && len(sPorts) == 0 && len(dPorts) == 0 && len(whitelistDPorts) == 0 && len(whitelistSPorts) == 0 {
tc := fmt.Sprintf("sudo nsenter -t %d -n tc qdisc replace dev %s root netem %v", target.Pid, netInterface, netemCommands)
tc := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %s root %v", target.NetworkNsPath, netInterface, netemCommands)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create tc rules", target.Source); err != nil {
return err
}
} else {
// Create a priority-based queue
// This instantly creates classes 1:1, 1:2, 1:3
priority := fmt.Sprintf("sudo nsenter -t %v -n tc qdisc replace dev %v root handle 1: prio", target.Pid, netInterface)
priority := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %v root handle 1: prio", target.NetworkNsPath, netInterface)
log.Info(priority)
if err := common.RunBashCommand(priority, "failed to create priority-based queue", target.Source); err != nil {
return err
@ -200,7 +209,7 @@ func injectChaos(netInterface string, target targetDetails) error {
// Add queueing discipline for 1:3 class.
// No traffic is going through 1:3 yet
traffic := fmt.Sprintf("sudo nsenter -t %v -n tc qdisc replace dev %v parent 1:3 netem %v", target.Pid, netInterface, netemCommands)
traffic := fmt.Sprintf("sudo nsenter --net=%s tc qdisc replace dev %v parent 1:3 %v", target.NetworkNsPath, netInterface, netemCommands)
log.Info(traffic)
if err := common.RunBashCommand(traffic, "failed to create netem queueing discipline", target.Source); err != nil {
return err
@ -209,7 +218,7 @@ func injectChaos(netInterface string, target targetDetails) error {
if len(whitelistDPorts) != 0 || len(whitelistSPorts) != 0 {
for _, port := range whitelistDPorts {
//redirect traffic to specific dport through band 2
tc := fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 2 u32 match ip dport %v 0xffff flowid 1:2", target.Pid, netInterface, port)
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 2 u32 match ip dport %v 0xffff flowid 1:2", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create whitelist dport match filters", target.Source); err != nil {
return err
@ -218,26 +227,55 @@ func injectChaos(netInterface string, target targetDetails) error {
for _, port := range whitelistSPorts {
//redirect traffic to specific sport through band 2
tc := fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 2 u32 match ip sport %v 0xffff flowid 1:2", target.Pid, netInterface, port)
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 2 u32 match ip sport %v 0xffff flowid 1:2", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create whitelist sport match filters", target.Source); err != nil {
return err
}
}
tc := fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dst 0.0.0.0/0 flowid 1:3", target.Pid, netInterface)
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dst 0.0.0.0/0 flowid 1:3", target.NetworkNsPath, netInterface)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create rule for all ports match filters", target.Source); err != nil {
return err
}
} else {
for _, ip := range target.DestinationIps {
// redirect traffic to specific IP through band 3
tc := fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dst %v flowid 1:3", target.Pid, netInterface, ip)
if strings.Contains(ip, ":") {
tc = fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip6 dst %v flowid 1:3", target.Pid, netInterface, ip)
for i := range target.DestinationIps {
var (
ip = target.DestinationIps[i]
ports []string
isIPV6 = strings.Contains(target.DestinationIps[i], ":")
)
// extracting the destination ports from the ips
// ip format is ip(|port1|port2....|portx)
if strings.Contains(target.DestinationIps[i], "|") {
ip = strings.Split(target.DestinationIps[i], "|")[0]
ports = strings.Split(target.DestinationIps[i], "|")[1:]
}
// redirect traffic to specific IP through band 3
filter := fmt.Sprintf("match ip dst %v", ip)
if isIPV6 {
filter = fmt.Sprintf("match ip6 dst %v", ip)
}
if len(ports) != 0 {
for _, port := range ports {
portFilter := fmt.Sprintf("%s match ip dport %v 0xffff", filter, port)
if isIPV6 {
portFilter = fmt.Sprintf("%s match ip6 dport %v 0xffff", filter, port)
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 %s flowid 1:3", target.NetworkNsPath, netInterface, portFilter)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ips match filters", target.Source); err != nil {
return err
}
}
continue
}
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 %s flowid 1:3", target.NetworkNsPath, netInterface, filter)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ips match filters", target.Source); err != nil {
return err
@ -246,7 +284,7 @@ func injectChaos(netInterface string, target targetDetails) error {
for _, port := range sPorts {
//redirect traffic to specific sport through band 3
tc := fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip sport %v 0xffff flowid 1:3", target.Pid, netInterface, port)
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip sport %v 0xffff flowid 1:3", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create source ports match filters", target.Source); err != nil {
return err
@ -255,7 +293,7 @@ func injectChaos(netInterface string, target targetDetails) error {
for _, port := range dPorts {
//redirect traffic to specific dport through band 3
tc := fmt.Sprintf("sudo nsenter -t %v -n tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dport %v 0xffff flowid 1:3", target.Pid, netInterface, port)
tc := fmt.Sprintf("sudo nsenter --net=%s tc filter add dev %v protocol ip parent 1:0 prio 3 u32 match ip dport %v 0xffff flowid 1:3", target.NetworkNsPath, netInterface, port)
log.Info(tc)
if err := common.RunBashCommand(tc, "failed to create destination ports match filters", target.Source); err != nil {
return err
@ -270,8 +308,7 @@ func injectChaos(netInterface string, target targetDetails) error {
// killnetem kill the netem process for all the target containers
func killnetem(target targetDetails, networkInterface string) (bool, error) {
tc := fmt.Sprintf("sudo nsenter -t %d -n tc qdisc delete dev %s root", target.Pid, networkInterface)
tc := fmt.Sprintf("sudo nsenter --net=%s tc qdisc delete dev %s root", target.NetworkNsPath, networkInterface)
cmd := exec.Command("/bin/bash", "-c", tc)
out, err := cmd.CombinedOutput()
@ -296,8 +333,8 @@ type targetDetails struct {
DestinationIps []string
TargetContainer string
ContainerId string
Pid int
Source string
NetworkNsPath string
}
// getENV fetches all the env variables from the runner pod

View File

@ -2,6 +2,7 @@ package corruption
import (
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
@ -16,6 +17,10 @@ func PodNetworkCorruptionChaos(ctx context.Context, experimentsDetails *experime
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkCorruptionFault")
defer span.End()
args := "corrupt " + experimentsDetails.NetworkPacketCorruptionPercentage
args := "netem corrupt " + experimentsDetails.NetworkPacketCorruptionPercentage
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -2,6 +2,7 @@ package duplication
import (
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
@ -16,6 +17,10 @@ func PodNetworkDuplicationChaos(ctx context.Context, experimentsDetails *experim
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkDuplicationFault")
defer span.End()
args := "duplicate " + experimentsDetails.NetworkPacketDuplicationPercentage
args := "netem duplicate " + experimentsDetails.NetworkPacketDuplicationPercentage
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -2,6 +2,7 @@ package latency
import (
"context"
"fmt"
"strconv"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
@ -17,6 +18,10 @@ func PodNetworkLatencyChaos(ctx context.Context, experimentsDetails *experimentT
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkLatencyFault")
defer span.End()
args := "delay " + strconv.Itoa(experimentsDetails.NetworkLatency) + "ms " + strconv.Itoa(experimentsDetails.Jitter) + "ms"
args := "netem delay " + strconv.Itoa(experimentsDetails.NetworkLatency) + "ms " + strconv.Itoa(experimentsDetails.Jitter) + "ms"
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -2,6 +2,7 @@ package loss
import (
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
@ -16,6 +17,10 @@ func PodNetworkLossChaos(ctx context.Context, experimentsDetails *experimentType
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkLossFault")
defer span.End()
args := "loss " + experimentsDetails.NetworkPacketLossPercentage
args := "netem loss " + experimentsDetails.NetworkPacketLossPercentage
if experimentsDetails.Correlation > 0 {
args = fmt.Sprintf("%s %d", args, experimentsDetails.Correlation)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -18,7 +18,6 @@ import (
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
@ -59,7 +58,7 @@ func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTy
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not experiment service account")
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
@ -120,25 +119,8 @@ func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experiment
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, true)
}
//Deleting all the helper pod for network chaos
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
@ -178,26 +160,8 @@ func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experime
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, true)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
@ -208,20 +172,24 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "CreatePodNetworkFaultHelperPod")
defer span.End()
privilegedEnable := true
terminationGracePeriodSeconds := int64(experimentsDetails.TerminationGracePeriodSeconds)
var (
privilegedEnable = true
terminationGracePeriodSeconds = int64(experimentsDetails.TerminationGracePeriodSeconds)
helperName = fmt.Sprintf("%s-helper-%s", experimentsDetails.ExperimentName, stringutils.GetRunID())
)
helperPod := &apiv1.Pod{
ObjectMeta: v1.ObjectMeta{
GenerateName: experimentsDetails.ExperimentName + "-helper-",
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
Name: helperName,
Namespace: experimentsDetails.ChaosNamespace,
Labels: common.GetHelperLabels(chaosDetails.Labels, runID, experimentsDetails.ExperimentName),
Annotations: chaosDetails.Annotations,
},
Spec: apiv1.PodSpec{
HostPID: true,
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
ImagePullSecrets: chaosDetails.ImagePullSecrets,
Tolerations: chaosDetails.Tolerations,
ServiceAccountName: experimentsDetails.ChaosServiceAccount,
RestartPolicy: apiv1.RestartPolicyNever,
NodeName: nodeName,
@ -275,10 +243,27 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
// mount the network ns path for crio runtime
// it is required to access the sandbox network ns
if strings.ToLower(experimentsDetails.ContainerRuntime) == "crio" {
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, apiv1.Volume{
Name: "netns-path",
VolumeSource: apiv1.VolumeSource{
HostPath: &apiv1.HostPathVolumeSource{
Path: "/var/run/netns",
},
},
})
helperPod.Spec.Containers[0].VolumeMounts = append(helperPod.Spec.Containers[0].VolumeMounts, apiv1.VolumeMount{
Name: "netns-path",
MountPath: "/var/run/netns",
})
}
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
@ -344,7 +329,7 @@ func getPodIPFromService(host string, clients clients.ClientSets) ([]string, err
return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{host: %s}", host), Reason: "provide the valid FQDN for service in '<svc-name>.<namespace>.svc.cluster.local format"}
}
svcName, svcNs := svcFields[0], svcFields[1]
svc, err := clients.KubeClient.CoreV1().Services(svcNs).Get(context.Background(), svcName, v1.GetOptions{})
svc, err := clients.GetService(svcNs, svcName)
if err != nil {
if k8serrors.IsForbidden(err) {
log.Warnf("forbidden - failed to get %v service in %v namespace, err: %v", svcName, svcNs, err)
@ -365,7 +350,7 @@ func getPodIPFromService(host string, clients clients.ClientSets) ([]string, err
svcSelector += fmt.Sprintf(",%s=%s", k, v)
}
pods, err := clients.KubeClient.CoreV1().Pods(svcNs).List(context.Background(), v1.ListOptions{LabelSelector: svcSelector})
pods, err := clients.ListPods(svcNs, svcSelector)
if err != nil {
return ips, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{svcName: %s,podLabel: %s, namespace: %s}", svcNs, svcSelector, svcNs), Reason: fmt.Sprintf("failed to derive pods from service: %s", err.Error())}
}
@ -389,27 +374,49 @@ func getIpsForTargetHosts(targetHosts string, clients clients.ClientSets, servic
var commaSeparatedIPs []string
for i := range hosts {
hosts[i] = strings.TrimSpace(hosts[i])
if strings.Contains(hosts[i], "svc.cluster.local") && serviceMesh {
ips, err := getPodIPFromService(hosts[i], clients)
var (
hostName = hosts[i]
ports []string
)
if strings.Contains(hosts[i], "|") {
host := strings.Split(hosts[i], "|")
hostName = host[0]
ports = host[1:]
log.Infof("host and port: %v :%v", hostName, ports)
}
if strings.Contains(hostName, "svc.cluster.local") && serviceMesh {
ips, err := getPodIPFromService(hostName, clients)
if err != nil {
return "", stacktrace.Propagate(err, "could not get pod ips from service")
}
log.Infof("Host: {%v}, IP address: {%v}", hosts[i], ips)
commaSeparatedIPs = append(commaSeparatedIPs, ips...)
if ports != nil {
for j := range ips {
commaSeparatedIPs = append(commaSeparatedIPs, ips[j]+"|"+strings.Join(ports, "|"))
}
} else {
commaSeparatedIPs = append(commaSeparatedIPs, ips...)
}
if finalHosts == "" {
finalHosts = hosts[i]
} else {
finalHosts = finalHosts + "," + hosts[i]
}
finalHosts = finalHosts + "," + hosts[i]
continue
}
ips, err := net.LookupIP(hosts[i])
ips, err := net.LookupIP(hostName)
if err != nil {
log.Warnf("Unknown host: {%v}, it won't be included in the scope of chaos", hosts[i])
log.Warnf("Unknown host: {%v}, it won't be included in the scope of chaos", hostName)
} else {
for j := range ips {
log.Infof("Host: {%v}, IP address: {%v}", hosts[i], ips[j])
log.Infof("Host: {%v}, IP address: {%v}", hostName, ips[j])
if ports != nil {
commaSeparatedIPs = append(commaSeparatedIPs, ips[j].String()+"|"+strings.Join(ports, "|"))
continue
}
commaSeparatedIPs = append(commaSeparatedIPs, ips[j].String())
}
if finalHosts == "" {
@ -505,6 +512,7 @@ func logExperimentFields(experimentsDetails *experimentTypes.ExperimentDetails)
"NetworkPacketLossPercentage": experimentsDetails.NetworkPacketLossPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-latency":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
@ -512,18 +520,28 @@ func logExperimentFields(experimentsDetails *experimentTypes.ExperimentDetails)
"Jitter": experimentsDetails.Jitter,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-corruption":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketCorruptionPercentage": experimentsDetails.NetworkPacketCorruptionPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-duplication":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkPacketDuplicationPercentage": experimentsDetails.NetworkPacketDuplicationPercentage,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
case "network-rate-limit":
log.InfoWithValues("[Info]: The chaos tunables are:", logrus.Fields{
"NetworkBandwidth": experimentsDetails.NetworkBandwidth,
"Sequence": experimentsDetails.Sequence,
"PodsAffectedPerc": experimentsDetails.PodsAffectedPerc,
"Correlation": experimentsDetails.Correlation,
})
}
}

View File

@ -0,0 +1,29 @@
package rate
import (
"context"
"fmt"
network_chaos "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"go.opentelemetry.io/otel"
)
// PodNetworkRateChaos contains the steps to prepare and inject chaos
func PodNetworkRateChaos(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PreparePodNetworkRateLimit")
defer span.End()
args := fmt.Sprintf("tbf rate %s burst %s limit %s", experimentsDetails.NetworkBandwidth, experimentsDetails.Burst, experimentsDetails.Limit)
if experimentsDetails.PeakRate != "" {
args = fmt.Sprintf("%s peakrate %s", args, experimentsDetails.PeakRate)
}
if experimentsDetails.MinBurst != "" {
args = fmt.Sprintf("%s mtu %s", args, experimentsDetails.MinBurst)
}
return network_chaos.PrepareAndInjectChaos(ctx, experimentsDetails, clients, resultDetails, eventsDetails, chaosDetails, args)
}

View File

@ -198,29 +198,8 @@ func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experime
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
for _, appNode := range targetNodeList {
common.SetTargets(appNode, "targeted", "node", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, false)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
@ -228,7 +207,7 @@ func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experime
// setCPUCapacity fetch the node cpu capacity
func setCPUCapacity(experimentsDetails *experimentTypes.ExperimentDetails, appNode string, clients clients.ClientSets) error {
node, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), appNode, v1.GetOptions{})
node, err := clients.GetNode(appNode, experimentsDetails.Timeout, experimentsDetails.Delay)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNode), Reason: err.Error()}
}
@ -282,10 +261,10 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}

View File

@ -172,7 +172,7 @@ func uncordonNode(experimentsDetails *experimentTypes.ExperimentDetails, clients
for _, targetNode := range targetNodes {
//Check node exist before uncordon the node
_, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), targetNode, v1.GetOptions{})
_, err := clients.GetNode(targetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil {
if apierrors.IsNotFound(err) {
log.Infof("[Info]: The %v node is no longer exist, skip uncordon the node", targetNode)

View File

@ -126,20 +126,11 @@ func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experiment
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
common.SetTargets(appNode, "injected", "node", chaosDetails)
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
common.SetTargets(appNode, "reverted", "node", chaosDetails)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, false)
}
common.SetTargets(appNode, "targeted", "node", chaosDetails)
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
}
return nil
@ -181,31 +172,12 @@ func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experime
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
for _, appNode := range targetNodeList {
common.SetTargets(appNode, "injected", "node", chaosDetails)
common.SetTargets(appNode, "targeted", "node", chaosDetails)
}
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
for _, appNode := range targetNodeList {
common.SetTargets(appNode, "reverted", "node", chaosDetails)
}
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, false)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
return nil
@ -249,10 +221,10 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}

View File

@ -16,11 +16,9 @@ import (
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-memory-hog/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
@ -134,30 +132,10 @@ func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experiment
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
common.SetTargets(appNode, "targeted", "node", chaosDetails)
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, experimentsDetails.ExperimentName)
if err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, false)
} else if podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pod status is %v", podStatus)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err := common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
}
return nil
@ -211,29 +189,12 @@ func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experime
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
for _, appNode := range targetNodeList {
common.SetTargets(appNode, "targeted", "node", chaosDetails)
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, false)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, false); err != nil {
return err
}
return nil
@ -241,8 +202,7 @@ func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experime
// getNodeMemoryDetails will return the total memory capacity and memory allocatable of an application node
func getNodeMemoryDetails(appNodeName string, clients clients.ClientSets) (int, int, error) {
nodeDetails, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), appNodeName, v1.GetOptions{})
nodeDetails, err := clients.GetNode(appNodeName, 180, 2)
if err != nil {
return 0, 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", appNodeName), Reason: err.Error()}
}
@ -365,10 +325,10 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}

View File

@ -15,8 +15,6 @@ import (
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/node-restart/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
@ -92,34 +90,12 @@ func PrepareNodeRestart(ctx context.Context, experimentsDetails *experimentTypes
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, experimentsDetails.RunID)
//Checking the status of helper pod
log.Info("[Status]: Checking the status of the helper pod")
if err = status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
if err := common.CheckHelperStatusAndRunProbes(ctx, appLabel, experimentsDetails.TargetNode, chaosDetails, clients, resultDetails, eventsDetails); err != nil {
return err
}
common.SetTargets(experimentsDetails.TargetNode, "targeted", "node", chaosDetails)
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return err
}
}
// Wait till the completion of helper pod
log.Info("[Wait]: Waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteHelperPodBasedOnJobCleanupPolicy(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, false)
}
//Deleting the helper pod
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeletePod(experimentsDetails.ExperimentName+"-helper-"+experimentsDetails.RunID, appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod")
if err := common.WaitForCompletionAndDeleteHelperPods(appLabel, chaosDetails, clients, false); err != nil {
return err
}
//Waiting for the ramp time after chaos injection
@ -127,6 +103,7 @@ func PrepareNodeRestart(ctx context.Context, experimentsDetails *experimentTypes
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", strconv.Itoa(experimentsDetails.RampTime))
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
@ -214,16 +191,16 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}
// getInternalIP gets the internal ip of the given node
func getInternalIP(nodeName string, clients clients.ClientSets) (string, error) {
node, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), nodeName, v1.GetOptions{})
node, err := clients.GetNode(nodeName, 180, 2)
if err != nil {
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Target: fmt.Sprintf("{nodeName: %s}", nodeName), Reason: err.Error()}
}

View File

@ -23,7 +23,6 @@ import (
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
var (
@ -132,7 +131,7 @@ func taintNode(ctx context.Context, experimentsDetails *experimentTypes.Experime
log.Infof("Add %v taints to the %v node", taintKey+"="+taintValue+":"+taintEffect, experimentsDetails.TargetNode)
// get the node details
node, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), experimentsDetails.TargetNode, v1.GetOptions{})
node, err := clients.GetNode(experimentsDetails.TargetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{nodeName: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
}
@ -158,8 +157,7 @@ func taintNode(ctx context.Context, experimentsDetails *experimentTypes.Experime
Effect: apiv1.TaintEffect(taintEffect),
})
_, err := clients.KubeClient.CoreV1().Nodes().Update(context.Background(), node, v1.UpdateOptions{})
if err != nil {
if err := clients.UpdateNode(chaosDetails, node); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Target: fmt.Sprintf("{nodeName: %s}", node.Name), Reason: fmt.Sprintf("failed to add taints: %s", err.Error())}
}
}
@ -179,7 +177,7 @@ func removeTaintFromNode(experimentsDetails *experimentTypes.ExperimentDetails,
taintKey := strings.Split(taintLabel[0], "=")[0]
// get the node details
node, err := clients.KubeClient.CoreV1().Nodes().Get(context.Background(), experimentsDetails.TargetNode, v1.GetOptions{})
node, err := clients.GetNode(experimentsDetails.TargetNode, chaosDetails.Timeout, chaosDetails.Delay)
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{nodeName: %s}", experimentsDetails.TargetNode), Reason: err.Error()}
}
@ -202,8 +200,7 @@ func removeTaintFromNode(experimentsDetails *experimentTypes.ExperimentDetails,
}
}
node.Spec.Taints = newTaints
updatedNodeWithTaint, err := clients.KubeClient.CoreV1().Nodes().Update(context.Background(), node, v1.UpdateOptions{})
if err != nil || updatedNodeWithTaint == nil {
if err := clients.UpdateNode(chaosDetails, node); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Target: fmt.Sprintf("{nodeName: %s}", node.Name), Reason: fmt.Sprintf("failed to remove taints: %s", err.Error())}
}
}

View File

@ -16,11 +16,9 @@ import (
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/pod-dns-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
@ -56,7 +54,7 @@ func PrepareAndInjectChaos(ctx context.Context, experimentsDetails *experimentTy
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not experiment service account")
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
@ -115,26 +113,8 @@ func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experiment
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return errors.Errorf("helper pods are not in running state, err: %v", err)
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, true)
}
//Deleting all the helper pod for pod-dns chaos
log.Info("[Cleanup]: Deleting the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
@ -146,7 +126,6 @@ func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experime
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodDNSFaultInParallelMode")
defer span.End()
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
@ -170,30 +149,8 @@ func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experime
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
containerNames := []string{experimentsDetails.ExperimentName}
if chaosDetails.SideCar != nil {
containerNames = append(containerNames, experimentsDetails.ExperimentName+"-sidecar")
}
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, containerNames...)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, true)
}
//Deleting all the helper pod for pod-dns chaos
log.Info("[Cleanup]: Deleting all the helper pod")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
@ -264,10 +221,10 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}

View File

@ -0,0 +1,260 @@
package lib
import (
"fmt"
"go.opentelemetry.io/otel"
"golang.org/x/net/context"
"os"
"os/signal"
"strings"
"syscall"
"time"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
awslib "github.com/litmuschaos/litmus-go/pkg/cloud/aws/rds"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/rds-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
)
var (
err error
inject, abort chan os.Signal
)
func PrepareRDSInstanceStop(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "PrepareRDSInstanceStop")
defer span.End()
// Inject channel is used to transmit signal notifications.
inject = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to inject channel.
signal.Notify(inject, os.Interrupt, syscall.SIGTERM)
// Abort channel is used to transmit signal notifications.
abort = make(chan os.Signal, 1)
// Catch and relay certain signal(s) to abort channel.
signal.Notify(abort, os.Interrupt, syscall.SIGTERM)
// Waiting for the ramp time before chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time before injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
// Get the instance identifier or list of instance identifiers
instanceIdentifierList := strings.Split(experimentsDetails.RDSInstanceIdentifier, ",")
if experimentsDetails.RDSInstanceIdentifier == "" || len(instanceIdentifierList) == 0 {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: "no RDS instance identifier found to stop"}
}
instanceIdentifierList = common.FilterBasedOnPercentage(experimentsDetails.InstanceAffectedPerc, instanceIdentifierList)
log.Infof("[Chaos]:Number of Instance targeted: %v", len(instanceIdentifierList))
// Watching for the abort signal and revert the chaos
go abortWatcher(experimentsDetails, instanceIdentifierList, chaosDetails)
switch strings.ToLower(experimentsDetails.Sequence) {
case "serial":
if err = injectChaosInSerialMode(ctx, experimentsDetails, instanceIdentifierList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in serial mode")
}
case "parallel":
if err = injectChaosInParallelMode(ctx, experimentsDetails, instanceIdentifierList, clients, resultDetails, eventsDetails, chaosDetails); err != nil {
return stacktrace.Propagate(err, "could not run chaos in parallel mode")
}
default:
return cerrors.Error{ErrorCode: cerrors.ErrorTypeTargetSelection, Reason: fmt.Sprintf("'%s' sequence is not supported", experimentsDetails.Sequence)}
}
// Waiting for the ramp time after chaos injection
if experimentsDetails.RampTime != 0 {
log.Infof("[Ramp]: Waiting for the %vs ramp time after injecting chaos", experimentsDetails.RampTime)
common.WaitForDuration(experimentsDetails.RampTime)
}
return nil
}
// injectChaosInSerialMode will inject the rds instance state in serial mode that is one after other
func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
select {
case <-inject:
// Stopping the chaos execution, if abort signal received
os.Exit(0)
default:
// ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instance identifier list, %v", instanceIdentifierList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on rds instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
for i, identifier := range instanceIdentifierList {
// Stopping the RDS instance
log.Info("[Chaos]: Stopping the desired RDS instance")
if err := awslib.RDSInstanceStop(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "injected", "RDS", chaosDetails)
// Wait for rds instance to completely stop
log.Infof("[Wait]: Wait for RDS instance '%v' to get in stopped state", identifier)
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
// Run the probes during chaos
// the OnChaos probes execution will start in the first iteration and keep running for the entire chaos duration
if len(resultDetails.ProbeDetails) != 0 && i == 0 {
if err = probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
// Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
// Starting the RDS instance
log.Info("[Chaos]: Starting back the RDS instance")
if err = awslib.RDSInstanceStart(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
// Wait for rds instance to get in available state
log.Infof("[Wait]: Wait for RDS instance '%v' to get in available state", identifier)
if err := awslib.WaitForRDSInstanceUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// injectChaosInParallelMode will inject the rds instance termination in parallel mode that is all at once
func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, clients clients.ClientSets, resultDetails *types.ResultDetails, eventsDetails *types.EventDetails, chaosDetails *types.ChaosDetails) error {
select {
case <-inject:
// stopping the chaos execution, if abort signal received
os.Exit(0)
default:
//ChaosStartTimeStamp contains the start timestamp, when the chaos injection begin
ChaosStartTimeStamp := time.Now()
duration := int(time.Since(ChaosStartTimeStamp).Seconds())
for duration < experimentsDetails.ChaosDuration {
log.Infof("[Info]: Target instance identifier list, %v", instanceIdentifierList)
if experimentsDetails.EngineName != "" {
msg := "Injecting " + experimentsDetails.ExperimentName + " chaos on rds instance"
types.SetEngineEventAttributes(eventsDetails, types.ChaosInject, msg, "Normal", chaosDetails)
events.GenerateEvents(eventsDetails, clients, chaosDetails, "ChaosEngine")
}
// PowerOff the instance
for _, identifier := range instanceIdentifierList {
// Stopping the RDS instance
log.Info("[Chaos]: Stopping the desired RDS instance")
if err := awslib.RDSInstanceStop(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "injected", "RDS", chaosDetails)
}
for _, identifier := range instanceIdentifierList {
// Wait for rds instance to completely stop
log.Infof("[Wait]: Wait for RDS instance '%v' to get in stopped state", identifier)
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to stop")
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
// Run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
return stacktrace.Propagate(err, "failed to run probes")
}
}
// Wait for chaos interval
log.Infof("[Wait]: Waiting for chaos interval of %vs", experimentsDetails.ChaosInterval)
time.Sleep(time.Duration(experimentsDetails.ChaosInterval) * time.Second)
// Starting the RDS instance
for _, identifier := range instanceIdentifierList {
log.Info("[Chaos]: Starting back the RDS instance")
if err = awslib.RDSInstanceStart(identifier, experimentsDetails.Region); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
}
for _, identifier := range instanceIdentifierList {
// Wait for rds instance to get in available state
log.Infof("[Wait]: Wait for RDS instance '%v' to get in available state", identifier)
if err := awslib.WaitForRDSInstanceUp(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
return stacktrace.Propagate(err, "rds instance failed to start")
}
}
for _, identifier := range instanceIdentifierList {
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
duration = int(time.Since(ChaosStartTimeStamp).Seconds())
}
}
return nil
}
// watching for the abort signal and revert the chaos
func abortWatcher(experimentsDetails *experimentTypes.ExperimentDetails, instanceIdentifierList []string, chaosDetails *types.ChaosDetails) {
<-abort
log.Info("[Abort]: Chaos Revert Started")
for _, identifier := range instanceIdentifierList {
instanceState, err := awslib.GetRDSInstanceStatus(identifier, experimentsDetails.Region)
if err != nil {
log.Errorf("Failed to get instance status when an abort signal is received: %v", err)
}
if instanceState != "running" {
log.Info("[Abort]: Waiting for the RDS instance to get down")
if err := awslib.WaitForRDSInstanceDown(experimentsDetails.Timeout, experimentsDetails.Delay, experimentsDetails.Region, identifier); err != nil {
log.Errorf("Unable to wait till stop of the instance: %v", err)
}
log.Info("[Abort]: Starting RDS instance as abort signal received")
err := awslib.RDSInstanceStart(identifier, experimentsDetails.Region)
if err != nil {
log.Errorf("RDS instance failed to start when an abort signal is received: %v", err)
}
}
common.SetTargets(identifier, "reverted", "RDS", chaosDetails)
}
log.Info("[Abort]: Chaos Revert Completed")
os.Exit(1)
}

View File

@ -5,10 +5,6 @@ import (
"bytes"
"context"
"fmt"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/palantir/stacktrace"
"go.opentelemetry.io/otel"
"io"
"os"
"os/exec"
@ -21,16 +17,23 @@ import (
"github.com/containerd/cgroups"
cgroupsv2 "github.com/containerd/cgroups/v2"
"github.com/palantir/stacktrace"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
clientTypes "k8s.io/apimachinery/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
clients "github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/telemetry"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
clientTypes "k8s.io/apimachinery/pkg/types"
)
// list of cgroups in a container
@ -110,33 +113,49 @@ func prepareStressChaos(experimentsDetails *experimentTypes.ExperimentDetails, c
return stacktrace.Propagate(err, "could not parse targets")
}
var (
targets []targetDetails
)
var targets []*targetDetails
for _, t := range targetList.Target {
td := targetDetails{
Name: t.Name,
Namespace: t.Namespace,
TargetContainer: t.TargetContainer,
Source: chaosDetails.ChaosPodName,
td := &targetDetails{
Name: t.Name,
Namespace: t.Namespace,
Source: chaosDetails.ChaosPodName,
}
td.ContainerId, err = common.GetContainerID(td.Namespace, td.Name, td.TargetContainer, clients, td.Source)
td.TargetContainers, err = common.GetTargetContainers(t.Name, t.Namespace, t.TargetContainer, chaosDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not get container id")
return stacktrace.Propagate(err, "could not get target containers")
}
// extract out the pid of the target container
td.Pid, err = common.GetPID(experimentsDetails.ContainerRuntime, td.ContainerId, experimentsDetails.SocketPath, td.Source)
td.ContainerIds, err = common.GetContainerIDs(td.Namespace, td.Name, td.TargetContainers, clients, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
return stacktrace.Propagate(err, "could not get container ids")
}
td.CGroupManager, err, td.GroupPath = getCGroupManager(td)
if err != nil {
return stacktrace.Propagate(err, "could not get cgroup manager")
for _, cid := range td.ContainerIds {
// extract out the pid of the target container
pid, err := common.GetPID(experimentsDetails.ContainerRuntime, cid, experimentsDetails.SocketPath, td.Source)
if err != nil {
return stacktrace.Propagate(err, "could not get container pid")
}
td.Pids = append(td.Pids, pid)
}
for i := range td.Pids {
cGroupManagers, err, grpPath := getCGroupManager(td, i)
if err != nil {
return stacktrace.Propagate(err, "could not get cgroup manager")
}
td.GroupPath = grpPath
td.CGroupManagers = append(td.CGroupManagers, cGroupManagers)
}
log.InfoWithValues("[Info]: Details of application under chaos injection", logrus.Fields{
"PodName": td.Name,
"Namespace": td.Namespace,
"TargetContainers": td.TargetContainers,
})
targets = append(targets, td)
}
@ -153,13 +172,20 @@ func prepareStressChaos(experimentsDetails *experimentTypes.ExperimentDetails, c
done := make(chan error, 1)
for index, t := range targets {
targets[index].Cmd, err = injectChaos(t, stressors, experimentsDetails.StressType)
if err != nil {
return stacktrace.Propagate(err, "could not inject chaos")
for i := range t.Pids {
cmd, err := injectChaos(t, stressors, i, experimentsDetails.StressType)
if err != nil {
if revertErr := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, index-1); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not inject chaos")
}
targets[index].Cmds = append(targets[index].Cmds, cmd)
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainers[i])
}
log.Infof("successfully injected chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "injected", "pod", t.Name); err != nil {
if revertErr := terminateProcess(t); revertErr != nil {
if revertErr := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, index); revertErr != nil {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s,%s]", stacktrace.RootCause(err).Error(), stacktrace.RootCause(revertErr).Error())}
}
return stacktrace.Propagate(err, "could not annotate chaosresult")
@ -179,18 +205,35 @@ func prepareStressChaos(experimentsDetails *experimentTypes.ExperimentDetails, c
var errList []string
var exitErr error
for _, t := range targets {
if err := t.Cmd.Wait(); err != nil {
if _, ok := err.(*exec.ExitError); ok {
exitErr = err
continue
for i := range t.Cmds {
if err := t.Cmds[i].Cmd.Wait(); err != nil {
log.Infof("stress process failed, err: %v, out: %v", err, t.Cmds[i].Buffer.String())
if _, ok := err.(*exec.ExitError); ok {
exitErr = err
continue
}
errList = append(errList, err.Error())
}
errList = append(errList, err.Error())
}
}
if exitErr != nil {
done <- exitErr
oomKilled, err := checkOOMKilled(targets, clients, exitErr)
if err != nil {
log.Infof("could not check oomkilled, err: %v", err)
}
if !oomKilled {
done <- exitErr
}
done <- nil
} else if len(errList) != 0 {
done <- fmt.Errorf("err: %v", strings.Join(errList, ", "))
oomKilled, err := checkOOMKilled(targets, clients, fmt.Errorf("err: %v", strings.Join(errList, ", ")))
if err != nil {
log.Infof("could not check oomkilled, err: %v", err)
}
if !oomKilled {
done <- fmt.Errorf("err: %v", strings.Join(errList, ", "))
}
done <- nil
} else {
done <- nil
}
@ -205,24 +248,17 @@ func prepareStressChaos(experimentsDetails *experimentTypes.ExperimentDetails, c
// the stress process gets timeout before completion
log.Infof("[Chaos] The stress process is not yet completed after the chaos duration of %vs", experimentsDetails.ChaosDuration+30)
log.Info("[Timeout]: Killing the stress process")
var errList []string
for _, t := range targets {
if err = terminateProcess(t); err != nil {
errList = append(errList, err.Error())
continue
}
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
if err := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
case err := <-done:
if err != nil {
exitErr, ok := err.(*exec.ExitError)
if ok {
status := exitErr.Sys().(syscall.WaitStatus)
if status.Signaled() {
log.Infof("process stopped with signal: %v", status.Signal())
}
if status.Signaled() && status.Signal() == syscall.SIGKILL {
// wait for the completion of abort handler
time.Sleep(10 * time.Second)
@ -232,34 +268,79 @@ func prepareStressChaos(experimentsDetails *experimentTypes.ExperimentDetails, c
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: chaosDetails.ChaosPodName, Reason: err.Error()}
}
log.Info("[Info]: Reverting Chaos")
var errList []string
for _, t := range targets {
if err := terminateProcess(t); err != nil {
errList = append(errList, err.Error())
continue
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if err = result.AnnotateChaosResult(resultDetails.Name, chaosDetails.ChaosNamespace, "reverted", "pod", t.Name); err != nil {
errList = append(errList, err.Error())
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
if err := revertChaosForAllTargets(targets, resultDetails, chaosDetails.ChaosNamespace, len(targets)-1); err != nil {
return stacktrace.Propagate(err, "could not revert chaos")
}
}
return nil
}
// terminateProcess will remove the stress process from the target container after chaos completion
func terminateProcess(t targetDetails) error {
if err := syscall.Kill(-t.Cmd.Process.Pid, syscall.SIGKILL); err != nil {
if strings.Contains(err.Error(), ProcessAlreadyKilled) || strings.Contains(err.Error(), ProcessAlreadyFinished) {
return nil
func revertChaosForAllTargets(targets []*targetDetails, resultDetails *types.ResultDetails, chaosNs string, index int) error {
var errList []string
for i := 0; i <= index; i++ {
if err := terminateProcess(targets[i]); err != nil {
errList = append(errList, err.Error())
continue
}
if err := result.AnnotateChaosResult(resultDetails.Name, chaosNs, "reverted", "pod", targets[i].Name); err != nil {
errList = append(errList, err.Error())
}
return cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("failed to revert chaos: %s", err.Error())}
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainer)
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
// checkOOMKilled checks if the container within the target pods failed due to an OOMKilled error.
func checkOOMKilled(targets []*targetDetails, clients clients.ClientSets, chaosError error) (bool, error) {
// Check each container in the pod
for i := 0; i < 3; i++ {
for _, t := range targets {
// Fetch the target pod
targetPod, err := clients.KubeClient.CoreV1().Pods(t.Namespace).Get(context.Background(), t.Name, v1.GetOptions{})
if err != nil {
return false, cerrors.Error{
ErrorCode: cerrors.ErrorTypeStatusChecks,
Target: fmt.Sprintf("{podName: %s, namespace: %s}", t.Name, t.Namespace),
Reason: err.Error(),
}
}
for _, c := range targetPod.Status.ContainerStatuses {
if utils.Contains(c.Name, t.TargetContainers) {
// Check for OOMKilled and restart
if c.LastTerminationState.Terminated != nil && c.LastTerminationState.Terminated.ExitCode == 137 {
log.Warnf("[Warning]: The target container '%s' of pod '%s' got OOM Killed, err: %v", c.Name, t.Name, chaosError)
return true, nil
}
}
}
}
time.Sleep(1 * time.Second)
}
return false, nil
}
// terminateProcess will remove the stress process from the target container after chaos completion
func terminateProcess(t *targetDetails) error {
var errList []string
for i := range t.Cmds {
if t.Cmds[i] != nil && t.Cmds[i].Cmd.Process != nil {
if err := syscall.Kill(-t.Cmds[i].Cmd.Process.Pid, syscall.SIGKILL); err != nil {
if strings.Contains(err.Error(), ProcessAlreadyKilled) || strings.Contains(err.Error(), ProcessAlreadyFinished) {
continue
}
errList = append(errList, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosRevert, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[i]), Reason: fmt.Sprintf("failed to revert chaos: %s", err.Error())}.Error())
continue
}
log.Infof("successfully reverted chaos on target: {name: %s, namespace: %v, container: %v}", t.Name, t.Namespace, t.TargetContainers[i])
}
}
if len(errList) != 0 {
return cerrors.PreserveError{ErrString: fmt.Sprintf("[%s]", strings.Join(errList, ","))}
}
return nil
}
@ -326,33 +407,33 @@ func prepareStressor(experimentDetails *experimentTypes.ExperimentDetails) []str
}
default:
log.Fatalf("stressor for %v experiment is not suported", experimentDetails.ExperimentName)
log.Fatalf("stressor for %v experiment is not supported", experimentDetails.ExperimentName)
}
return stressArgs
}
// pidPath will get the pid path of the container
func pidPath(t targetDetails) cgroups.Path {
processPath := "/proc/" + strconv.Itoa(t.Pid) + "/cgroup"
paths, err := parseCgroupFile(processPath, t)
func pidPath(t *targetDetails, index int) cgroups.Path {
processPath := "/proc/" + strconv.Itoa(t.Pids[index]) + "/cgroup"
paths, err := parseCgroupFile(processPath, t, index)
if err != nil {
return getErrorPath(errors.Wrapf(err, "parse cgroup file %s", processPath))
}
return getExistingPath(paths, t.Pid, "")
return getExistingPath(paths, t.Pids[index], "")
}
// parseCgroupFile will read and verify the cgroup file entry of a container
func parseCgroupFile(path string, t targetDetails) (map[string]string, error) {
func parseCgroupFile(path string, t *targetDetails, index int) (map[string]string, error) {
file, err := os.Open(path)
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("fail to parse cgroup: %s", err.Error())}
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to parse cgroup: %s", err.Error())}
}
defer file.Close()
return parseCgroupFromReader(file, t)
return parseCgroupFromReader(file, t, index)
}
// parseCgroupFromReader will parse the cgroup file from the reader
func parseCgroupFromReader(r io.Reader, t targetDetails) (map[string]string, error) {
func parseCgroupFromReader(r io.Reader, t *targetDetails, index int) (map[string]string, error) {
var (
cgroups = make(map[string]string)
s = bufio.NewScanner(r)
@ -363,7 +444,7 @@ func parseCgroupFromReader(r io.Reader, t targetDetails) (map[string]string, err
parts = strings.SplitN(text, ":", 3)
)
if len(parts) < 3 {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("invalid cgroup entry: %q", text)}
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("invalid cgroup entry: %q", text)}
}
for _, subs := range strings.Split(parts[1], ",") {
if subs != "" {
@ -372,7 +453,7 @@ func parseCgroupFromReader(r io.Reader, t targetDetails) (map[string]string, err
}
}
if err := s.Err(); err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("buffer scanner failed: %s", err.Error())}
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("buffer scanner failed: %s", err.Error())}
}
return cgroups, nil
@ -439,18 +520,18 @@ func getCgroupDestination(pid int, subsystem string) (string, error) {
}
// findValidCgroup will be used to get a valid cgroup path
func findValidCgroup(path cgroups.Path, t targetDetails) (string, error) {
func findValidCgroup(path cgroups.Path, t *targetDetails, index int) (string, error) {
for _, subsystem := range cgroupSubsystemList {
path, err := path(cgroups.Name(subsystem))
if err != nil {
log.Errorf("fail to retrieve the cgroup path, subsystem: %v, target: %v, err: %v", subsystem, t.ContainerId, err)
log.Errorf("fail to retrieve the cgroup path, subsystem: %v, target: %v, err: %v", subsystem, t.ContainerIds[index], err)
continue
}
if strings.Contains(path, t.ContainerId) {
if strings.Contains(path, t.ContainerIds[index]) {
return path, nil
}
}
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: "could not find valid cgroup"}
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: "could not find valid cgroup"}
}
// getENV fetches all the env variables from the runner pod
@ -475,7 +556,7 @@ func getENV(experimentDetails *experimentTypes.ExperimentDetails) {
}
// abortWatcher continuously watch for the abort signals
func abortWatcher(targets []targetDetails, resultName, chaosNS string) {
func abortWatcher(targets []*targetDetails, resultName, chaosNS string) {
<-abort
@ -501,37 +582,37 @@ func abortWatcher(targets []targetDetails, resultName, chaosNS string) {
}
// getCGroupManager will return the cgroup for the given pid of the process
func getCGroupManager(t targetDetails) (interface{}, error, string) {
func getCGroupManager(t *targetDetails, index int) (interface{}, error, string) {
if cgroups.Mode() == cgroups.Unified {
groupPath := ""
output, err := exec.Command("bash", "-c", fmt.Sprintf("nsenter -t 1 -C -m -- cat /proc/%v/cgroup", t.Pid)).CombinedOutput()
output, err := exec.Command("bash", "-c", fmt.Sprintf("nsenter -t 1 -C -m -- cat /proc/%v/cgroup", t.Pids[index])).CombinedOutput()
if err != nil {
return nil, errors.Errorf("Error in getting groupPath,%s", string(output)), ""
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to get the cgroup: %s :%v", err.Error(), output)}, ""
}
parts := strings.SplitN(string(output), ":", 3)
log.Infof("cgroup output: %s", string(output))
parts := strings.Split(string(output), ":")
if len(parts) < 3 {
return "", fmt.Errorf("invalid cgroup entry: %s", string(output)), ""
}
if parts[0] == "0" && parts[1] == "" {
groupPath = parts[2]
if strings.HasSuffix(parts[len(parts)-3], "0") && parts[len(parts)-2] == "" {
groupPath = parts[len(parts)-1]
}
log.Infof("group path: %s", groupPath)
cgroup2, err := cgroupsv2.LoadManager("/sys/fs/cgroup", groupPath)
cgroup2, err := cgroupsv2.LoadManager("/sys/fs/cgroup", string(groupPath))
if err != nil {
return nil, errors.Errorf("Error loading cgroup v2 manager, %v", err), ""
}
return cgroup2, nil, groupPath
}
path := pidPath(t)
cgroup, err := findValidCgroup(path, t)
path := pidPath(t, index)
cgroup, err := findValidCgroup(path, t, index)
if err != nil {
return nil, stacktrace.Propagate(err, "could not find valid cgroup"), ""
}
cgroup1, err := cgroups.Load(cgroups.V1, cgroups.StaticPath(cgroup))
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("fail to load the cgroup: %s", err.Error())}, ""
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeHelper, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to load the cgroup: %s", err.Error())}, ""
}
return cgroup1, nil, ""
@ -555,14 +636,13 @@ func addProcessToCgroup(pid int, control interface{}, groupPath string) error {
return cgroup1.Add(cgroups.Process{Pid: pid})
}
func injectChaos(t targetDetails, stressors, stressType string) (*exec.Cmd, error) {
stressCommand := fmt.Sprintf("pause nsutil -t %v -p -- %v", strconv.Itoa(t.Pid), stressors)
func injectChaos(t *targetDetails, stressors string, index int, stressType string) (*Command, error) {
stressCommand := fmt.Sprintf("pause nsutil -t %v -p -- %v", strconv.Itoa(t.Pids[index]), stressors)
// for io stress,we need to enter into mount ns of the target container
// enabling it by passing -m flag
if stressType == "pod-io-stress" {
stressCommand = fmt.Sprintf("pause nsutil -t %v -p -m -- %v", strconv.Itoa(t.Pid), stressors)
stressCommand = fmt.Sprintf("pause nsutil -t %v -p -m -- %v", strconv.Itoa(t.Pids[index]), stressors)
}
log.Infof("[Info]: starting process: %v", stressCommand)
// launch the stress-ng process on the target container in paused mode
@ -570,17 +650,18 @@ func injectChaos(t targetDetails, stressors, stressType string) (*exec.Cmd, erro
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
var buf bytes.Buffer
cmd.Stdout = &buf
cmd.Stderr = &buf
err = cmd.Start()
if err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("failed to start stress process: %s", err.Error())}
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("failed to start stress process: %s", err.Error())}
}
// add the stress process to the cgroup of target container
if err = addProcessToCgroup(cmd.Process.Pid, t.CGroupManager, t.GroupPath); err != nil {
if err = addProcessToCgroup(cmd.Process.Pid, t.CGroupManagers[index], t.GroupPath); err != nil {
if killErr := cmd.Process.Kill(); killErr != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("fail to add the stress process to cgroup %s and kill stress process: %s", err.Error(), killErr.Error())}
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to add the stress process to cgroup %s and kill stress process: %s", err.Error(), killErr.Error())}
}
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("fail to add the stress process to cgroup: %s", err.Error())}
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to add the stress process to cgroup: %s", err.Error())}
}
log.Info("[Info]: Sending signal to resume the stress process")
@ -590,19 +671,27 @@ func injectChaos(t targetDetails, stressors, stressType string) (*exec.Cmd, erro
// remove pause and resume or start the stress process
if err := cmd.Process.Signal(syscall.SIGCONT); err != nil {
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainer), Reason: fmt.Sprintf("fail to remove pause and start the stress process: %s", err.Error())}
return nil, cerrors.Error{ErrorCode: cerrors.ErrorTypeChaosInject, Source: t.Source, Target: fmt.Sprintf("{podName: %s, namespace: %s, container: %s}", t.Name, t.Namespace, t.TargetContainers[index]), Reason: fmt.Sprintf("fail to remove pause and start the stress process: %s", err.Error())}
}
return cmd, nil
return &Command{
Cmd: cmd,
Buffer: buf,
}, nil
}
type targetDetails struct {
Name string
Namespace string
TargetContainer string
ContainerId string
Pid int
CGroupManager interface{}
Cmd *exec.Cmd
Source string
GroupPath string
Name string
Namespace string
TargetContainers []string
ContainerIds []string
Pids []int
CGroupManagers []interface{}
Cmds []*Command
Source string
GroupPath string
}
type Command struct {
Cmd *exec.Cmd
Buffer bytes.Buffer
}

View File

@ -16,7 +16,6 @@ import (
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/stress-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/litmuschaos/litmus-go/pkg/utils/stringutils"
@ -80,7 +79,7 @@ func PrepareAndInjectStressChaos(ctx context.Context, experimentsDetails *experi
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not experiment service account")
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
@ -139,27 +138,8 @@ func injectChaosInSerialMode(ctx context.Context, experimentsDetails *experiment
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, true)
}
//Deleting all the helper pod for stress chaos
log.Info("[Cleanup]: Deleting the helper pod")
err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}
return nil
@ -170,7 +150,6 @@ func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experime
ctx, span := otel.Tracer(telemetry.TracerName).Start(ctx, "InjectPodStressFaultInParallelMode")
defer span.End()
var err error
// run the probes during chaos
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, chaosDetails, clients, resultDetails, "DuringChaos", eventsDetails); err != nil {
@ -194,27 +173,8 @@ func injectChaosInParallelMode(ctx context.Context, experimentsDetails *experime
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pods, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pods")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, chaosDetails.ChaosNamespace, true)
}
//Deleting all the helper pod for stress chaos
log.Info("[Cleanup]: Deleting all the helper pod")
err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients)
if err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
return nil
@ -305,10 +265,10 @@ func createHelperPod(ctx context.Context, experimentsDetails *experimentTypes.Ex
helperPod.Spec.Volumes = append(helperPod.Spec.Volumes, common.GetSidecarVolumes(chaosDetails)...)
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaosNamespace).Create(context.Background(), helperPod, v1.CreateOptions{})
if err != nil {
if err := clients.CreatePod(experimentsDetails.ChaosNamespace, helperPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create helper pod: %s", err.Error())}
}
return nil
}

View File

@ -44,7 +44,7 @@ func experimentExecution(ctx context.Context, experimentsDetails *experimentType
if experimentsDetails.ChaosServiceAccount == "" {
experimentsDetails.ChaosServiceAccount, err = common.GetServiceAccount(experimentsDetails.ChaosNamespace, experimentsDetails.ChaosPodName, clients)
if err != nil {
return stacktrace.Propagate(err, "could not experiment service account")
return stacktrace.Propagate(err, "could not get experiment service account")
}
}
@ -95,28 +95,10 @@ func runChaos(ctx context.Context, experimentsDetails *experimentTypes.Experimen
common.SetTargets(pod.Name, "targeted", "pod", chaosDetails)
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
appLabel := fmt.Sprintf("app=%s-helper-%s", experimentsDetails.ExperimentName, runID)
//checking the status of the helper pod, wait till the pod comes to running state else fail the experiment
log.Info("[Status]: Checking the status of the helper pod")
if err := status.CheckHelperStatus(experimentsDetails.ChaosNamespace, appLabel, experimentsDetails.Timeout, experimentsDetails.Delay, clients); err != nil {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return stacktrace.Propagate(err, "could not check helper status")
}
// Wait till the completion of the helper pod
// set an upper limit for the waiting time
log.Info("[Wait]: waiting till the completion of the helper pod")
podStatus, err := status.WaitForCompletion(experimentsDetails.ChaosNamespace, appLabel, clients, experimentsDetails.ChaosDuration+experimentsDetails.Timeout, common.GetContainerNames(chaosDetails)...)
if err != nil || podStatus == "Failed" {
common.DeleteAllHelperPodBasedOnJobCleanupPolicy(appLabel, chaosDetails, clients)
return common.HelperFailedError(err, appLabel, experimentsDetails.ChaosNamespace, true)
}
//Deleting all the helper pod for container-kill chaos
log.Info("[Cleanup]: Deleting all the helper pods")
if err = common.DeleteAllPod(appLabel, experimentsDetails.ChaosNamespace, chaosDetails.Timeout, chaosDetails.Delay, clients); err != nil {
return stacktrace.Propagate(err, "could not delete helper pod(s)")
if err := common.ManagerHelperLifecycle(appLabel, chaosDetails, clients, true); err != nil {
return err
}
}

View File

@ -39,7 +39,7 @@ func Experiment(ctx context.Context, clients clients.ClientSets){
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -41,7 +41,7 @@ func Experiment(ctx context.Context, clients clients.ClientSets){
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func Experiment(clients clients.ClientSets){
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func Experiment(ctx context.Context, clients clients.ClientSets){
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -40,7 +40,7 @@ func AWSSSMChaosByID(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes: %v", err)
return
}

View File

@ -40,7 +40,7 @@ func AWSSSMChaosByTag(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes: %v", err)
return
}

View File

@ -41,7 +41,7 @@ func AzureDiskLoss(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err = types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err = common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes: %v", err)
return
}

View File

@ -42,7 +42,7 @@ func AzureInstanceStop(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err = types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err = common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes: %v", err)
}
}

View File

@ -40,7 +40,7 @@ func NodeRestart(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -42,7 +42,7 @@ func CasssandraPodDelete(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.ChaoslibDetail.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err = types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err = common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -45,7 +45,7 @@ func GCPVMDiskLossByLabel(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -45,7 +45,7 @@ func VMDiskLoss(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err = types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err = common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -45,7 +45,7 @@ func GCPVMInstanceStopByLabel(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -45,7 +45,7 @@ func VMInstanceStop(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func ContainerKill(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func DiskFill(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func DockerServiceKill(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func KubeletServiceKill(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func NodeCPUHog(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func NodeDrain(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func NodeIOStress(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func NodeMemoryHog(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func NodeRestart(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func NodeTaint(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func PodAutoscaler(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func PodCPUHogExec(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func PodCPUHog(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -38,7 +38,7 @@ func PodDelete(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func PodDNSError(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -40,7 +40,7 @@ func PodDNSSpoof(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err = types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err = common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func PodFioStress(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func PodHttpLatency(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func PodHttpModifyBody(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func PodHttpModifyHeader(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func PodHttpResetPeer(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -40,7 +40,7 @@ func PodHttpStatusCode(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func PodIOStress(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func PodMemoryHogExec(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func PodMemoryHog(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func PodNetworkCorruption(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func PodNetworkDuplication(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func PodNetworkLatency(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -38,7 +38,7 @@ func PodNetworkLoss(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func PodNetworkPartition(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("unable to initialize the probes, err: %v", err)
return
}

View File

@ -0,0 +1,14 @@
## Experiment Metadata
<table>
<tr>
<th> Name </th>
<th> Description </th>
<th> Documentation Link </th>
</tr>
<tr>
<td> Pod Network Rate Limit </td>
<td> This experiment adds network rate limiting for target application pods. </td>
<td> <a href="https://litmuschaos.github.io/litmus/experiments/categories/pods/pod-network-rate-limit/"> Here </a> </td>
</tr>
</table>

View File

@ -0,0 +1,177 @@
package experiment
import (
"context"
"os"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/network-chaos/lib/rate"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentEnv "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/environment"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/generic/network-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/status"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/sirupsen/logrus"
)
// PodNetworkRateLimit inject the pod-network-rate-limit chaos
func PodNetworkRateLimit(ctx context.Context, clients clients.ClientSets) {
experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{}
chaosDetails := types.ChaosDetails{}
eventsDetails := types.EventDetails{}
//Fetching all the ENV passed from the runner pod
log.Infof("[PreReq]: Getting the ENV for the %v experiment", os.Getenv("EXPERIMENT_NAME"))
experimentEnv.GetENV(&experimentsDetails, "pod-network-rate-limit")
// Initialize events Parameters
types.InitialiseChaosVariables(&chaosDetails)
// Initialize Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}
}
//Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT"); err != nil {
log.Errorf("Unable to Create the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
// generating the event in chaosresult to mark the verdict as awaited
msg := "experiment: " + experimentsDetails.ExperimentName + ", Result: Awaited"
types.SetResultEventAttributes(&eventsDetails, types.AwaitedVerdict, msg, "Normal", &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")
//DISPLAY THE APP INFORMATION
log.InfoWithValues("The application information is as follows\n", logrus.Fields{
"Targets": common.GetAppDetailsForLogging(chaosDetails.AppDetail),
"Target Container": experimentsDetails.TargetContainer,
"Chaos Duration": experimentsDetails.ChaosDuration,
"Container Runtime": experimentsDetails.ContainerRuntime,
"Network Bandwidth": experimentsDetails.NetworkBandwidth,
"Burst": experimentsDetails.Burst,
"Limit": experimentsDetails.Limit,
})
// Calling AbortWatcher go routine, it will continuously watch for the abort signal and generate the required events and result
go common.AbortWatcher(experimentsDetails.ExperimentName, clients, &resultDetails, &chaosDetails, &eventsDetails)
//PRE-CHAOS APPLICATION STATUS CHECK
if chaosDetails.DefaultHealthCheck {
log.Info("[Status]: Verify that the AUT (Application Under Test) is running (pre-chaos)")
if err := status.AUTStatusCheck(clients, &chaosDetails); err != nil {
log.Errorf("Application status check failed, err: %v", err)
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, "AUT: Not Running", "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
}
if experimentsDetails.EngineName != "" {
// marking AUT as running, as we already checked the status of application under test
msg := common.GetStatusMessage(chaosDetails.DefaultHealthCheck, "AUT: Running", "")
// run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails); err != nil {
log.Errorf("Probe Failed, err: %v", err)
msg := common.GetStatusMessage(chaosDetails.DefaultHealthCheck, "AUT: Running", "Unsuccessful")
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = common.GetStatusMessage(chaosDetails.DefaultHealthCheck, "AUT: Running", "Successful")
}
// generating the events for the pre-chaos check
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
}
chaosDetails.Phase = types.ChaosInjectPhase
if err := litmusLIB.PodNetworkRateChaos(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
//POST-CHAOS APPLICATION STATUS CHECK
if chaosDetails.DefaultHealthCheck {
log.Info("[Status]: Verify that the AUT (Application Under Test) is running (post-chaos)")
if err := status.AUTStatusCheck(clients, &chaosDetails); err != nil {
log.Infof("Application status check failed, err: %v", err)
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, "AUT: Not Running", "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
}
if experimentsDetails.EngineName != "" {
// marking AUT as running, as we already checked the status of application under test
msg := common.GetStatusMessage(chaosDetails.DefaultHealthCheck, "AUT: Running", "")
// run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err := probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails); err != nil {
log.Errorf("Probes Failed, err: %v", err)
msg := common.GetStatusMessage(chaosDetails.DefaultHealthCheck, "AUT: Running", "Unsuccessful")
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = common.GetStatusMessage(chaosDetails.DefaultHealthCheck, "AUT: Running", "Successful")
}
// generating post chaos event
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
}
//Updating the chaosResult in the end of experiment
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err := result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT"); err != nil {
log.Errorf("Unable to Update the Chaos Result, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// generating the event in chaosresult to mark the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason, eventType := types.GetChaosResultVerdictEvent(resultDetails.Verdict)
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult")
if experimentsDetails.EngineName != "" {
msg := experimentsDetails.ExperimentName + " experiment has been " + string(resultDetails.Verdict) + "ed"
types.SetEngineEventAttributes(&eventsDetails, types.Summary, msg, "Normal", &chaosDetails)
events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine")
}
}

View File

@ -0,0 +1,48 @@
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: pod-network-rate-limit-sa
namespace: default
labels:
name: pod-network-rate-limit-sa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-network-rate-limit-sa
namespace: default
labels:
name: pod-network-rate-limit-sa
rules:
- apiGroups: [""]
resources: ["pods","events"]
verbs: ["create","list","get","patch","update","delete","deletecollection"]
- apiGroups: [""]
resources: ["pods/exec","pods/log"]
verbs: ["list","get","create"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create","list","get","delete","deletecollection"]
- apiGroups: ["networking.k8s.io"]
resources: ["networkpolicies"]
verbs: ["create","delete","list","get"]
- apiGroups: ["litmuschaos.io"]
resources: ["chaosengines","chaosexperiments","chaosresults"]
verbs: ["create","list","get","patch","update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: pod-network-rate-limit-sa
namespace: default
labels:
name: pod-network-rate-limit-sa
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: pod-network-rate-limit-sa
subjects:
- kind: ServiceAccount
name: pod-network-rate-limit-sa
namespace: default

View File

@ -0,0 +1,95 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: litmus-experiment
spec:
replicas: 1
selector:
matchLabels:
app: litmus-experiment
template:
metadata:
labels:
app: litmus-experiment
spec:
serviceAccountName: pod-network-rate-limit-sa
containers:
- name: gotest
image: busybox
command:
- sleep
- "3600"
env:
- name: APP_NAMESPACE
value: 'default'
- name: APP_LABEL
value: 'run=nginx'
- name: TARGET_CONTAINER
value: 'nginx'
- name: APP_KIND
value: 'deployment'
- name: NETWORK_INTERFACE
value: 'eth0'
- name: TC_IMAGE
value: 'gaiadocker/iproute2'
- name: NETWORK_BANDWIDTH
value: "1mbit"
- name: BURST
value: "32kb"
- name: LIMIT
value: "2mb"
- name: MIN_BURST
value: ""
- name: PEAK_RATE
value: ""
- name: TOTAL_CHAOS_DURATION
value: '60'
- name: TARGET_POD
value: ''
- name: LIB_IMAGE
value: 'litmuschaos/go-runner:ci'
- name: CHAOS_NAMESPACE
value: 'default'
- name: RAMP_TIME
value: ''
## percentage of total pods to target
- name: PODS_AFFECTED_PERC
value: ''
# provide the name of container runtime
# it supports docker, containerd, crio
# defaults to containerd
- name: CONTAINER_RUNTIME
value: 'containerd'
# provide the container runtime path
# applicable only for containerd and crio runtime
- name: SOCKET_PATH
value: '/run/containerd/containerd.sock'
- name: CHAOS_SERVICE_ACCOUNT
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name

View File

@ -41,7 +41,7 @@ func KafkaBrokerPodFailure(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.ChaoslibDetail.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -40,7 +40,7 @@ func EBSLossByID(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err = types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err = common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func EBSLossByTag(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes: %v", err)
return
}

View File

@ -45,7 +45,7 @@ func EC2TerminateByID(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err = types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err = common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes: %v", err)
return
}

View File

@ -44,7 +44,7 @@ func EC2TerminateByTag(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err = types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err = common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes: %v", err)
return
}

View File

@ -0,0 +1,14 @@
## Experiment Metadata
<table>
<tr>
<th> Name </th>
<th> Description </th>
<th> Documentation Link </th>
</tr>
<tr>
<td> RDS Instance Stop </td>
<td> This experiment causes the state change of an RDS instance to be in stopped state before bringing it back to available using the instance identifier after the specified chaos duration. We can also control the number of target instance using the instance affected percentage</td>
<td> <a href="https://litmuschaos.github.io/litmus/experiments/categories/aws/rds-instance-stop/"> Here </a> </td>
</tr>
</table>

View File

@ -0,0 +1,190 @@
package experiment
import (
"context"
"os"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
litmusLIB "github.com/litmuschaos/litmus-go/chaoslib/litmus/rds-instance-stop/lib"
"github.com/litmuschaos/litmus-go/pkg/clients"
aws "github.com/litmuschaos/litmus-go/pkg/cloud/aws/rds"
"github.com/litmuschaos/litmus-go/pkg/events"
experimentEnv "github.com/litmuschaos/litmus-go/pkg/kube-aws/rds-instance-stop/environment"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/rds-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/probe"
"github.com/litmuschaos/litmus-go/pkg/result"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/common"
"github.com/sirupsen/logrus"
)
// RDSInstanceStop will stop an aws rds instance
func RDSInstanceStop(ctx context.Context, clients clients.ClientSets) {
var (
err error
)
experimentsDetails := experimentTypes.ExperimentDetails{}
resultDetails := types.ResultDetails{}
eventsDetails := types.EventDetails{}
chaosDetails := types.ChaosDetails{}
// Fetching all the ENV passed from the runner pod
log.Infof("[PreReq]: Getting the ENV for the %v experiment", os.Getenv("EXPERIMENT_NAME"))
experimentEnv.GetENV(&experimentsDetails)
// Initialize the chaos attributes
types.InitialiseChaosVariables(&chaosDetails)
// Initialize Chaos Result Parameters
types.SetResultAttributes(&resultDetails, chaosDetails)
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err = common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes: %v", err)
return
}
}
// Updating the chaos result in the beginning of experiment
log.Infof("[PreReq]: Updating the chaos result of %v experiment (SOT)", experimentsDetails.ExperimentName)
if err = result.ChaosResult(&chaosDetails, clients, &resultDetails, "SOT"); err != nil {
log.Errorf("Unable to create the chaosresult: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
// Set the chaos result uid
result.SetResultUID(&resultDetails, clients, &chaosDetails)
// Generating the event in chaosresult to mark the verdict as awaited
msg := "experiment: " + experimentsDetails.ExperimentName + ", Result: Awaited"
types.SetResultEventAttributes(&eventsDetails, types.AwaitedVerdict, msg, "Normal", &resultDetails)
if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosresult", types.AwaitedVerdict)
}
// DISPLAY THE INSTANCE INFORMATION
log.InfoWithValues("The instance information is as follows", logrus.Fields{
"Chaos Duration": experimentsDetails.ChaosDuration,
"Chaos Namespace": experimentsDetails.ChaosNamespace,
"Instance Identifier": experimentsDetails.RDSInstanceIdentifier,
"Instance Affected Percentage": experimentsDetails.InstanceAffectedPerc,
"Sequence": experimentsDetails.Sequence,
})
// Calling AbortWatcher go routine, it will continuously watch for the abort signal and generate the required events and result
go common.AbortWatcherWithoutExit(experimentsDetails.ExperimentName, clients, &resultDetails, &chaosDetails, &eventsDetails)
if experimentsDetails.EngineName != "" {
// Marking AUT as running, as we already checked the status of application under test
msg := "AUT: Running"
// Run the probes in the pre-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PreChaos", &eventsDetails); err != nil {
log.Errorf("Probe Failed: %v", err)
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Warning", &chaosDetails)
if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosengine", types.PreChaosCheck)
}
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
}
// Generating the events for the pre-chaos check
types.SetEngineEventAttributes(&eventsDetails, types.PreChaosCheck, msg, "Normal", &chaosDetails)
if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosengine", types.PreChaosCheck)
}
}
// Verify the aws rds instance is available (pre-chaos)
if chaosDetails.DefaultHealthCheck {
log.Info("[Status]: Verify that the aws rds instances are in available state (pre-chaos)")
if err = aws.InstanceStatusCheckByInstanceIdentifier(experimentsDetails.RDSInstanceIdentifier, experimentsDetails.Region); err != nil {
log.Errorf("RDS instance status check failed, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Status]: RDS instance is in available state (pre-chaos)")
}
chaosDetails.Phase = types.ChaosInjectPhase
if err = litmusLIB.PrepareRDSInstanceStop(ctx, &experimentsDetails, clients, &resultDetails, &eventsDetails, &chaosDetails); err != nil {
log.Errorf("Chaos injection failed, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Infof("[Confirmation]: %v chaos has been injected successfully", experimentsDetails.ExperimentName)
resultDetails.Verdict = v1alpha1.ResultVerdictPassed
chaosDetails.Phase = types.PostChaosPhase
// Verify the aws rds instance is available (post-chaos)
if chaosDetails.DefaultHealthCheck {
log.Info("[Status]: Verify that the aws rds instances are in available state (post-chaos)")
if err = aws.InstanceStatusCheckByInstanceIdentifier(experimentsDetails.RDSInstanceIdentifier, experimentsDetails.Region); err != nil {
log.Errorf("RDS instance status check failed, err: %v", err)
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
log.Info("[Status]: RDS instance is in available state (post-chaos)")
}
if experimentsDetails.EngineName != "" {
// Marking AUT as running, as we already checked the status of application under test
msg := "AUT: Running"
// Run the probes in the post-chaos check
if len(resultDetails.ProbeDetails) != 0 {
if err = probe.RunProbes(ctx, &chaosDetails, clients, &resultDetails, "PostChaos", &eventsDetails); err != nil {
log.Errorf("Probes Failed: %v", err)
msg := "AUT: Running, Probes: Unsuccessful"
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Warning", &chaosDetails)
if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosengine", types.PostChaosCheck)
}
result.RecordAfterFailure(&chaosDetails, &resultDetails, err, clients, &eventsDetails)
return
}
msg = "AUT: Running, Probes: Successful"
}
// Generating post chaos event
types.SetEngineEventAttributes(&eventsDetails, types.PostChaosCheck, msg, "Normal", &chaosDetails)
if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosengine", types.PostChaosCheck)
}
}
// Updating the chaosResult in the end of experiment
log.Infof("[The End]: Updating the chaos result of %v experiment (EOT)", experimentsDetails.ExperimentName)
if err = result.ChaosResult(&chaosDetails, clients, &resultDetails, "EOT"); err != nil {
log.Errorf("Unable to update the chaosresult: %v", err)
return
}
// Generating the event in chaosresult to mark the verdict as pass/fail
msg = "experiment: " + experimentsDetails.ExperimentName + ", Result: " + string(resultDetails.Verdict)
reason, eventType := types.GetChaosResultVerdictEvent(resultDetails.Verdict)
types.SetResultEventAttributes(&eventsDetails, reason, msg, eventType, &resultDetails)
if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosResult"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosresult", reason)
}
if experimentsDetails.EngineName != "" {
msg := experimentsDetails.ExperimentName + " experiment has been " + string(resultDetails.Verdict) + "ed"
types.SetEngineEventAttributes(&eventsDetails, types.Summary, msg, "Normal", &chaosDetails)
if eventErr := events.GenerateEvents(&eventsDetails, clients, &chaosDetails, "ChaosEngine"); eventErr != nil {
log.Errorf("Failed to create %v event inside chaosengine", types.Summary)
}
}
}

View File

@ -0,0 +1,49 @@
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: rds-instance-stop-sa
namespace: default
labels:
name: rds-instance-stop-sa
app.kubernetes.io/part-of: litmus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: rds-instance-stop-sa
labels:
name: rds-instance-stop-sa
app.kubernetes.io/part-of: litmus
rules:
- apiGroups: [""]
resources: ["pods","events","secrets"]
verbs: ["create","list","get","patch","update","delete","deletecollection"]
- apiGroups: [""]
resources: ["pods/exec","pods/log"]
verbs: ["create","list","get"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create","list","get","delete","deletecollection"]
- apiGroups: ["litmuschaos.io"]
resources: ["chaosengines","chaosexperiments","chaosresults"]
verbs: ["create","list","get","patch","update"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["patch","get","list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: rds-instance-stop-sa
labels:
name: rds-instance-stop-sa
app.kubernetes.io/part-of: litmus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: rds-instance-stop-sa
subjects:
- kind: ServiceAccount
name: rds-instance-stop-sa
namespace: default

View File

@ -0,0 +1,46 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: litmus-experiment
spec:
replicas: 1
selector:
matchLabels:
app: litmus-experiment
template:
metadata:
labels:
app: litmus-experiment
spec:
serviceAccountName: rds-instance-stop-sa
containers:
- name: gotest
image: busybox
command:
- sleep
- "3600"
env:
- name: CHAOS_NAMESPACE
value: 'default'
- name: RDS_INSTANCE_IDENTIFIER
value: ''
- name: REGION
value: ''
- name: RAMP_TIME
value: ''
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
volumeMounts:
- name: cloud-secret
mountPath: /tmp/
volumes:
- name: cloud-secret
secret:
secretName: cloud-secret

View File

@ -39,7 +39,7 @@ func Experiment(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -39,7 +39,7 @@ func Experiment(ctx context.Context, clients clients.ClientSets, expName string)
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes, err: %v", err)
return
}

View File

@ -42,7 +42,7 @@ func VMPoweroff(ctx context.Context, clients clients.ClientSets) {
if experimentsDetails.EngineName != "" {
// Get values from chaosengine. Bail out upon error, as we haven't entered exp business logic yet
if err := types.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
if err := common.GetValuesFromChaosEngine(&chaosDetails, clients, &resultDetails); err != nil {
log.Errorf("Unable to initialize the probes: %v", err)
return
}

5
go.mod
View File

@ -20,7 +20,7 @@ require (
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.27.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.27.0
go.opentelemetry.io/otel/sdk v1.27.0
go.opentelemetry.io/otel/trace v1.27.0
golang.org/x/net v0.25.0
google.golang.org/api v0.169.0
gopkg.in/yaml.v2 v2.4.0
k8s.io/api v0.30.1
@ -45,6 +45,7 @@ require (
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/dimchansky/utfbom v1.1.1 // indirect
github.com/docker/go-units v0.4.0 // indirect
github.com/evanphx/json-patch v5.9.11+incompatible // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/go-logr/logr v1.4.1 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
@ -75,9 +76,9 @@ require (
go.opencensus.io v0.24.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.49.0 // indirect
go.opentelemetry.io/otel/metric v1.27.0 // indirect
go.opentelemetry.io/otel/trace v1.27.0 // indirect
go.opentelemetry.io/proto/otlp v1.2.0 // indirect
golang.org/x/crypto v0.23.0 // indirect
golang.org/x/net v0.25.0 // indirect
golang.org/x/oauth2 v0.20.0 // indirect
golang.org/x/sys v0.20.0 // indirect
golang.org/x/term v0.20.0 // indirect

2
go.sum
View File

@ -146,6 +146,8 @@ github.com/envoyproxy/protoc-gen-validate v0.1.0/go.mod h1:iSmxcyjqTsJpI2R4NaDN7
github.com/evanphx/json-patch v0.5.2/go.mod h1:ZWS5hhDbVDyob71nXKNL0+PWn6ToqBHMikGIFbs31qQ=
github.com/evanphx/json-patch v4.9.0+incompatible/go.mod h1:50XU6AFN0ol/bzJsmQLiYLvXMP4fmwYFNcr97nuDLSk=
github.com/evanphx/json-patch v4.11.0+incompatible/go.mod h1:50XU6AFN0ol/bzJsmQLiYLvXMP4fmwYFNcr97nuDLSk=
github.com/evanphx/json-patch v5.9.11+incompatible h1:ixHHqfcGvxhWkniF1tWxBHA0yb4Z+d1UQi45df52xW8=
github.com/evanphx/json-patch v5.9.11+incompatible/go.mod h1:50XU6AFN0ol/bzJsmQLiYLvXMP4fmwYFNcr97nuDLSk=
github.com/fatih/color v1.7.0/go.mod h1:Zm6kSWBoL9eyXnKyktHP6abPY2pDugNf5KwzbycvMj4=
github.com/felixge/httpsnoop v1.0.4 h1:NFTV2Zj1bL4mc9sqWACXbQFVBBg2W3GPvqp8/ESS2Wg=
github.com/felixge/httpsnoop v1.0.4/go.mod h1:m8KPJKqk1gH5J9DgRY2ASl2lWCfGKXixSwevea8zH2U=

View File

@ -95,8 +95,7 @@ func LivenessCleanup(experimentsDetails *experimentTypes.ExperimentDetails, clie
// GetLivenessPodResourceVersion will return the resource version of the liveness pod
func GetLivenessPodResourceVersion(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (string, error) {
livenessPods, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaoslibDetail.AppNS).List(context.Background(), metav1.ListOptions{LabelSelector: "name=cassandra-liveness-deploy-" + experimentsDetails.RunID})
livenessPods, err := clients.ListPods(experimentsDetails.ChaoslibDetail.AppNS, fmt.Sprintf("name=cassandra-liveness-deploy-%s", experimentsDetails.RunID))
if err != nil {
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("failed to get the liveness pod, %s", err.Error())}
} else if len(livenessPods.Items) == 0 {
@ -109,12 +108,10 @@ func GetLivenessPodResourceVersion(experimentsDetails *experimentTypes.Experimen
// GetServiceClusterIP will return the cluster IP of the liveness service
func GetServiceClusterIP(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (string, error) {
service, err := clients.KubeClient.CoreV1().Services(experimentsDetails.ChaoslibDetail.AppNS).Get(context.Background(), "cassandra-liveness-service-"+experimentsDetails.RunID, metav1.GetOptions{})
service, err := clients.GetService(experimentsDetails.ChaoslibDetail.AppNS, fmt.Sprintf("cassandra-liveness-service-%s", experimentsDetails.RunID))
if err != nil {
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("failed to fetch the liveness service, %s", err.Error())}
}
return service.Spec.ClusterIP, nil
}
@ -328,8 +325,8 @@ func CreateLivenessPod(experimentsDetails *experimentTypes.ExperimentDetails, cl
}
// Creating liveness deployment
_, err := clients.KubeClient.AppsV1().Deployments(experimentsDetails.ChaoslibDetail.AppNS).Create(context.Background(), liveness, metav1.CreateOptions{})
if err != nil {
// Creating liveness deployment
if err := clients.CreateDeployment(experimentsDetails.ChaoslibDetail.AppNS, liveness); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{deploymentName: %s, namespace: %s}", liveness.Name, liveness.Namespace), Reason: fmt.Sprintf("unable to create liveness deployment, %s", err.Error())}
}
log.Info("Liveness Deployment Created successfully!")
@ -366,8 +363,7 @@ func CreateLivenessService(experimentsDetails *experimentTypes.ExperimentDetails
}
// Creating liveness service
_, err := clients.KubeClient.CoreV1().Services(experimentsDetails.ChaoslibDetail.AppNS).Create(context.Background(), livenessSvc, metav1.CreateOptions{})
if err != nil {
if err := clients.CreateService(experimentsDetails.ChaoslibDetail.AppNS, livenessSvc); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Target: fmt.Sprintf("{serviceName: %s, namespace: %s}", livenessSvc.Name, livenessSvc.Namespace), Reason: fmt.Sprintf("unable to create liveness service, %s", err.Error())}
}
log.Info("Liveness service created successfully!")

View File

@ -1,7 +1,6 @@
package cassandra
import (
"context"
"fmt"
"strings"
@ -12,7 +11,6 @@ import (
experimentTypes "github.com/litmuschaos/litmus-go/pkg/cassandra/pod-delete/types"
"github.com/litmuschaos/litmus-go/pkg/clients"
"github.com/litmuschaos/litmus-go/pkg/log"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// NodeToolStatusCheck checks for the distribution of the load on the ring
@ -47,7 +45,7 @@ func NodeToolStatusCheck(experimentsDetails *experimentTypes.ExperimentDetails,
// GetApplicationPodName will return the name of first application pod
func GetApplicationPodName(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (string, error) {
podList, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaoslibDetail.AppNS).List(context.Background(), metav1.ListOptions{LabelSelector: experimentsDetails.ChaoslibDetail.AppLabel})
podList, err := clients.ListPods(experimentsDetails.ChaoslibDetail.AppNS, experimentsDetails.ChaoslibDetail.AppLabel)
if err != nil {
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("failed to get the application pod in %v namespace, err: %v", experimentsDetails.ChaoslibDetail.AppNS, err)}
} else if len(podList.Items) == 0 {
@ -59,7 +57,7 @@ func GetApplicationPodName(experimentsDetails *experimentTypes.ExperimentDetails
// GetApplicationReplicaCount will return the replica count of the sts application
func GetApplicationReplicaCount(experimentsDetails *experimentTypes.ExperimentDetails, clients clients.ClientSets) (int, error) {
podList, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.ChaoslibDetail.AppNS).List(context.Background(), metav1.ListOptions{LabelSelector: experimentsDetails.ChaoslibDetail.AppLabel})
podList, err := clients.ListPods(experimentsDetails.ChaoslibDetail.AppNS, experimentsDetails.ChaoslibDetail.AppLabel)
if err != nil {
return 0, cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("failed to get the application pod in %v namespace, err: %v", experimentsDetails.ChaoslibDetail.AppNS, err)}
} else if len(podList.Items) == 0 {

195
pkg/clients/k8s.go Normal file
View File

@ -0,0 +1,195 @@
package clients
import (
"context"
"time"
appsv1 "k8s.io/api/apps/v1"
core_v1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
)
var (
defaultTimeout = 180
defaultDelay = 2
)
func (clients *ClientSets) GetPod(namespace, name string, timeout, delay int) (*core_v1.Pod, error) {
var (
pod *core_v1.Pod
err error
)
if err := retry.
Times(uint(timeout / delay)).
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
pod, err = clients.KubeClient.CoreV1().Pods(namespace).Get(context.Background(), name, v1.GetOptions{})
return err
}); err != nil {
return nil, err
}
return pod, nil
}
func (clients *ClientSets) GetAllPod(namespace string) (*core_v1.PodList, error) {
var (
pods *core_v1.PodList
err error
)
if err := retry.
Times(uint(defaultTimeout / defaultDelay)).
Wait(time.Duration(defaultDelay) * time.Second).
Try(func(attempt uint) error {
pods, err = clients.KubeClient.CoreV1().Pods(namespace).List(context.Background(), v1.ListOptions{})
return err
}); err != nil {
return nil, err
}
return pods, nil
}
func (clients *ClientSets) ListPods(namespace, labels string) (*core_v1.PodList, error) {
var (
pods *core_v1.PodList
err error
)
if err := retry.
Times(uint(defaultTimeout / defaultDelay)).
Wait(time.Duration(defaultDelay) * time.Second).
Try(func(attempt uint) error {
pods, err = clients.KubeClient.CoreV1().Pods(namespace).List(context.Background(), v1.ListOptions{
LabelSelector: labels,
})
return err
}); err != nil {
return nil, err
}
return pods, nil
}
func (clients *ClientSets) GetService(namespace, name string) (*core_v1.Service, error) {
var (
pod *core_v1.Service
err error
)
if err := retry.
Times(uint(defaultTimeout / defaultDelay)).
Wait(time.Duration(defaultDelay) * time.Second).
Try(func(attempt uint) error {
pod, err = clients.KubeClient.CoreV1().Services(namespace).Get(context.Background(), name, v1.GetOptions{})
return err
}); err != nil {
return nil, err
}
return pod, nil
}
func (clients *ClientSets) CreatePod(namespace string, pod *core_v1.Pod) error {
return retry.
Times(uint(defaultTimeout / defaultDelay)).
Wait(time.Duration(defaultDelay) * time.Second).
Try(func(attempt uint) error {
_, err := clients.KubeClient.CoreV1().Pods(namespace).Create(context.Background(), pod, v1.CreateOptions{})
return err
})
}
func (clients *ClientSets) CreateDeployment(namespace string, deploy *appsv1.Deployment) error {
return retry.
Times(uint(defaultTimeout / defaultDelay)).
Wait(time.Duration(defaultDelay) * time.Second).
Try(func(attempt uint) error {
_, err := clients.KubeClient.AppsV1().Deployments(namespace).Create(context.Background(), deploy, v1.CreateOptions{})
return err
})
}
func (clients *ClientSets) CreateService(namespace string, svc *core_v1.Service) error {
return retry.
Times(uint(defaultTimeout / defaultDelay)).
Wait(time.Duration(defaultDelay) * time.Second).
Try(func(attempt uint) error {
_, err := clients.KubeClient.CoreV1().Services(namespace).Create(context.Background(), svc, v1.CreateOptions{})
return err
})
}
func (clients *ClientSets) GetNode(name string, timeout, delay int) (*core_v1.Node, error) {
var (
node *core_v1.Node
err error
)
if err := retry.
Times(uint(timeout / delay)).
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
node, err = clients.KubeClient.CoreV1().Nodes().Get(context.Background(), name, v1.GetOptions{})
return err
}); err != nil {
return nil, err
}
return node, nil
}
func (clients *ClientSets) GetAllNode(timeout, delay int) (*core_v1.NodeList, error) {
var (
nodes *core_v1.NodeList
err error
)
if err := retry.
Times(uint(timeout / delay)).
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
nodes, err = clients.KubeClient.CoreV1().Nodes().List(context.Background(), v1.ListOptions{})
return err
}); err != nil {
return nil, err
}
return nodes, nil
}
func (clients *ClientSets) ListNode(labels string, timeout, delay int) (*core_v1.NodeList, error) {
var (
nodes *core_v1.NodeList
err error
)
if err := retry.
Times(uint(timeout / delay)).
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
nodes, err = clients.KubeClient.CoreV1().Nodes().List(context.Background(), v1.ListOptions{
LabelSelector: labels,
})
return err
}); err != nil {
return nil, err
}
return nodes, nil
}
func (clients *ClientSets) UpdateNode(chaosDetails *types.ChaosDetails, node *core_v1.Node) error {
return retry.
Times(uint(chaosDetails.Timeout / chaosDetails.Delay)).
Wait(time.Duration(chaosDetails.Delay) * time.Second).
Try(func(attempt uint) error {
_, err := clients.KubeClient.CoreV1().Nodes().Update(context.Background(), node, v1.UpdateOptions{})
return err
})
}

125
pkg/clients/litmus.go Normal file
View File

@ -0,0 +1,125 @@
package clients
import (
"context"
"fmt"
"time"
clientTypes "k8s.io/apimachinery/pkg/types"
"github.com/litmuschaos/chaos-operator/api/litmuschaos/v1alpha1"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/types"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
k8serrors "k8s.io/apimachinery/pkg/api/errors"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
func (clients *ClientSets) GetChaosEngine(chaosDetails *types.ChaosDetails) (*v1alpha1.ChaosEngine, error) {
var (
engine *v1alpha1.ChaosEngine
err error
)
if err := retry.
Times(uint(chaosDetails.Timeout / chaosDetails.Delay)).
Wait(time.Duration(chaosDetails.Delay) * time.Second).
Try(func(attempt uint) error {
engine, err = clients.LitmusClient.ChaosEngines(chaosDetails.ChaosNamespace).Get(context.Background(), chaosDetails.EngineName, v1.GetOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: err.Error(), Target: fmt.Sprintf("{engineName: %s, engineNs: %s}", chaosDetails.EngineName, chaosDetails.ChaosNamespace)}
}
return nil
}); err != nil {
return nil, err
}
return engine, nil
}
func (clients *ClientSets) UpdateChaosEngine(chaosDetails *types.ChaosDetails, engine *v1alpha1.ChaosEngine) error {
return retry.
Times(uint(chaosDetails.Timeout / chaosDetails.Delay)).
Wait(time.Duration(chaosDetails.Delay) * time.Second).
Try(func(attempt uint) error {
_, err := clients.LitmusClient.ChaosEngines(chaosDetails.ChaosNamespace).Update(context.Background(), engine, v1.UpdateOptions{})
return err
})
}
func (clients *ClientSets) GetChaosResult(chaosDetails *types.ChaosDetails) (*v1alpha1.ChaosResult, error) {
var result v1alpha1.ChaosResult
if err := retry.
Times(uint(chaosDetails.Timeout / chaosDetails.Delay)).
Wait(time.Duration(chaosDetails.Delay) * time.Second).
Try(func(attempt uint) error {
resultList, err := clients.LitmusClient.ChaosResults(chaosDetails.ChaosNamespace).List(context.Background(),
v1.ListOptions{LabelSelector: fmt.Sprintf("chaosUID=%s", chaosDetails.ChaosUID)})
if err != nil {
return err
}
if len(resultList.Items) == 0 {
return nil
}
result = resultList.Items[0]
return nil
}); err != nil {
return nil, err
}
return &result, nil
}
func (clients *ClientSets) CreateChaosResult(chaosDetails *types.ChaosDetails, result *v1alpha1.ChaosResult) (bool, error) {
var exists bool
err := retry.
Times(uint(chaosDetails.Timeout / chaosDetails.Delay)).
Wait(time.Duration(chaosDetails.Delay) * time.Second).
Try(func(attempt uint) error {
_, err := clients.LitmusClient.ChaosResults(chaosDetails.ChaosNamespace).Create(context.Background(), result, v1.CreateOptions{})
if err != nil {
if k8serrors.IsAlreadyExists(err) {
exists = true
return nil
}
return err
}
return nil
})
return exists, err
}
func (clients *ClientSets) UpdateChaosResult(chaosDetails *types.ChaosDetails, result *v1alpha1.ChaosResult) error {
err := retry.
Times(uint(chaosDetails.Timeout / chaosDetails.Delay)).
Wait(time.Duration(chaosDetails.Delay) * time.Second).
Try(func(attempt uint) error {
_, err := clients.LitmusClient.ChaosResults(chaosDetails.ChaosNamespace).Update(context.Background(), result, v1.UpdateOptions{})
if err != nil {
return err
}
return nil
})
return err
}
func (clients *ClientSets) PatchChaosResult(chaosDetails *types.ChaosDetails, resultName string, mergePatch []byte) error {
err := retry.
Times(uint(chaosDetails.Timeout / chaosDetails.Delay)).
Wait(time.Duration(chaosDetails.Delay) * time.Second).
Try(func(attempt uint) error {
_, err := clients.LitmusClient.ChaosResults(chaosDetails.ChaosNamespace).Patch(context.TODO(), resultName, clientTypes.MergePatchType, mergePatch, v1.PatchOptions{})
if err != nil {
return err
}
return nil
})
return err
}

View File

@ -0,0 +1,77 @@
package aws
import (
"fmt"
"strings"
"github.com/aws/aws-sdk-go/service/rds"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/common"
"github.com/litmuschaos/litmus-go/pkg/log"
)
// GetRDSInstanceStatus will verify and give the rds instance details.
func GetRDSInstanceStatus(instanceIdentifier, region string) (string, error) {
var err error
// Load session from shared config
sess := common.GetAWSSession(region)
// Create new RDS client
rdsSvc := rds.New(sess)
// Call to get detailed information on each instance
result, err := rdsSvc.DescribeDBInstances(nil)
if err != nil {
return "", cerrors.Error{
ErrorCode: cerrors.ErrorTypeStatusChecks,
Reason: fmt.Sprintf("failed to describe the instances: %v", err),
Target: fmt.Sprintf("{RDS Instance Identifier: %v, Region: %v}", instanceIdentifier, region),
}
}
for _, instanceDetails := range result.DBInstances {
if *instanceDetails.DBInstanceIdentifier == instanceIdentifier {
return *instanceDetails.DBInstanceStatus, nil
}
}
return "", cerrors.Error{
ErrorCode: cerrors.ErrorTypeStatusChecks,
Reason: "failed to get the status of RDS instance",
Target: fmt.Sprintf("{RDS Instance Identifier: %v, Region: %v}", instanceIdentifier, region),
}
}
// InstanceStatusCheckByInstanceIdentifier is used to check the instance status of all the instance under chaos.
func InstanceStatusCheckByInstanceIdentifier(instanceIdentifier, region string) error {
instanceIdentifierList := strings.Split(instanceIdentifier, ",")
if instanceIdentifier == "" || len(instanceIdentifierList) == 0 {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeStatusChecks,
Reason: "no instance identifier provided to stop",
Target: fmt.Sprintf("{RDS Instance Identifier: %v, Region: %v}", instanceIdentifier, region),
}
}
log.Infof("[Info]: The instances under chaos(IUC) are: %v", instanceIdentifierList)
return InstanceStatusCheck(instanceIdentifierList, region)
}
// InstanceStatusCheck is used to check the instance status of the instances.
func InstanceStatusCheck(instanceIdentifierList []string, region string) error {
for _, id := range instanceIdentifierList {
instanceState, err := GetRDSInstanceStatus(id, region)
if err != nil {
return err
}
if instanceState != "available" {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeStatusChecks,
Reason: fmt.Sprintf("rds instance is not in available state, current state: %v", instanceState),
Target: fmt.Sprintf("{RDS Instance Identifier: %v, Region: %v}", id, region),
}
}
}
return nil
}

View File

@ -0,0 +1,125 @@
package aws
import (
"fmt"
"time"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/service/rds"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/common"
"github.com/litmuschaos/litmus-go/pkg/log"
"github.com/litmuschaos/litmus-go/pkg/utils/retry"
"github.com/palantir/stacktrace"
"github.com/sirupsen/logrus"
)
// RDSInstanceStop will stop an aws rds instance
func RDSInstanceStop(identifier, region string) error {
// Load session from shared config
sess := common.GetAWSSession(region)
// Create new RDS client
rdsSvc := rds.New(sess)
input := &rds.StopDBInstanceInput{
DBInstanceIdentifier: aws.String(identifier),
}
result, err := rdsSvc.StopDBInstance(input)
if err != nil {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: fmt.Sprintf("failed to stop RDS instance: %v", common.CheckAWSError(err).Error()),
Target: fmt.Sprintf("{RDS Instance Identifier: %v, Region: %v}", identifier, region),
}
}
log.InfoWithValues("Stopping RDS instance:", logrus.Fields{
"DBInstanceStatus": *result.DBInstance.DBInstanceStatus,
"DBInstanceIdentifier": *result.DBInstance.DBInstanceIdentifier,
})
return nil
}
// RDSInstanceStart will start an aws rds instance
func RDSInstanceStart(identifier, region string) error {
// Load session from shared config
sess := common.GetAWSSession(region)
// Create new RDS client
rdsSvc := rds.New(sess)
input := &rds.StartDBInstanceInput{
DBInstanceIdentifier: aws.String(identifier),
}
result, err := rdsSvc.StartDBInstance(input)
if err != nil {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: fmt.Sprintf("failed to start RDS instance: %v", common.CheckAWSError(err).Error()),
Target: fmt.Sprintf("{RDS Instance Identifier: %v, Region: %v}", identifier, region),
}
}
log.InfoWithValues("Starting RDS instance:", logrus.Fields{
"DBInstanceStatus": *result.DBInstance.DBInstanceStatus,
"DBInstanceIdentifier": *result.DBInstance.DBInstanceIdentifier,
})
return nil
}
// WaitForRDSInstanceDown will wait for the rds instance to get in stopped state
func WaitForRDSInstanceDown(timeout, delay int, region, identifier string) error {
log.Info("[Status]: Checking RDS instance status")
return retry.
Times(uint(timeout / delay)).
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
instanceState, err := GetRDSInstanceStatus(identifier, region)
if err != nil {
return stacktrace.Propagate(err, "failed to get the status of RDS instance")
}
if instanceState != "stopped" {
log.Infof("The instance state is %v", instanceState)
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeStatusChecks,
Reason: fmt.Sprintf("RDS instance is not in stopped state"),
Target: fmt.Sprintf("{RDS Instance Identifier: %v, Region: %v}", identifier, region),
}
}
log.Infof("The instance state is %v", instanceState)
return nil
})
}
// WaitForRDSInstanceUp will wait for the rds instance to get in available state
func WaitForRDSInstanceUp(timeout, delay int, region, identifier string) error {
log.Info("[Status]: Checking RDS instance status")
return retry.
Times(uint(timeout / delay)).
Wait(time.Duration(delay) * time.Second).
Try(func(attempt uint) error {
instanceState, err := GetRDSInstanceStatus(identifier, region)
if err != nil {
return stacktrace.Propagate(err, "failed to get the status of RDS instance")
}
if instanceState != "available" {
log.Infof("The instance state is %v", instanceState)
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeStatusChecks,
Reason: fmt.Sprintf("RDS instance is not in available state"),
Target: fmt.Sprintf("{RDS Instance Identifier: %v, Region: %v}", identifier, region),
}
}
log.Infof("The instance state is %v", instanceState)
return nil
})
}

View File

@ -2,12 +2,17 @@ package ssm
import (
"fmt"
"math"
"math/rand"
"strconv"
"strings"
"time"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/awserr"
"github.com/aws/aws-sdk-go/aws/request"
"github.com/aws/aws-sdk-go/service/ssm"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/aws-ssm/aws-ssm-chaos/types"
"github.com/litmuschaos/litmus-go/pkg/cerrors"
"github.com/litmuschaos/litmus-go/pkg/cloud/aws/common"
@ -23,6 +28,14 @@ const (
DefaultSSMDocsDirectory = "LitmusChaos-AWS-SSM-Docs.yml"
)
// awsErrHasCode checks if an AWS error has a specific error code
func awsErrHasCode(err error, code string) bool {
if aerr, ok := err.(awserr.Error); ok {
return aerr.Code() == code
}
return false
}
// SendSSMCommand will create and add the ssm document in aws service monitoring docs.
func SendSSMCommand(experimentsDetails *experimentTypes.ExperimentDetails, ec2InstanceID []string) (string, error) {
@ -126,48 +139,116 @@ func getSSMCommandStatus(commandID, ec2InstanceID, region string) (string, error
return *cmdOutput.Status, nil
}
// CheckInstanceInformation will check if the instance has permission to do smm api calls
// CheckInstanceInformation checks if the instance has permission to do SSM API calls,
func CheckInstanceInformation(experimentsDetails *experimentTypes.ExperimentDetails) error {
var instanceIDList []string
var input *ssm.DescribeInstanceInformationInput
switch {
case experimentsDetails.EC2InstanceID != "":
// If specific instance IDs are provided, use instance ID filter
instanceIDList = strings.Split(experimentsDetails.EC2InstanceID, ",")
input = &ssm.DescribeInstanceInformationInput{
Filters: []*ssm.InstanceInformationStringFilter{
{
Key: aws.String("InstanceIds"),
Values: aws.StringSlice(instanceIDList),
},
},
}
default:
// If using tags, first verify we have valid targets
if err := CheckTargetInstanceStatus(experimentsDetails); err != nil {
return stacktrace.Propagate(err, "failed to check target instance(s) status")
}
instanceIDList = experimentsDetails.TargetInstanceIDList
// For filtering by instance IDs that we collected from tags
input = &ssm.DescribeInstanceInformationInput{
Filters: []*ssm.InstanceInformationStringFilter{
{
Key: aws.String("InstanceIds"),
Values: aws.StringSlice(instanceIDList),
},
},
}
}
sesh := common.GetAWSSession(experimentsDetails.Region)
ssmClient := ssm.New(sesh)
for _, ec2ID := range instanceIDList {
res, err := ssmClient.DescribeInstanceInformation(&ssm.DescribeInstanceInformationInput{})
var (
foundInstances = make(map[string]bool)
err error
maxRetries = 5
maxRetryDuration = time.Second * 30
startTime = time.Now()
)
for attempt := 0; attempt < maxRetries; attempt++ {
if time.Since(startTime) > maxRetryDuration {
break
}
err = ssmClient.DescribeInstanceInformationPages(input,
func(page *ssm.DescribeInstanceInformationOutput, lastPage bool) bool {
for _, instanceDetails := range page.InstanceInformationList {
if instanceDetails.InstanceId != nil {
foundInstances[*instanceDetails.InstanceId] = true
}
}
return true // continue to next page
})
if err != nil {
awsErr := common.CheckAWSError(err)
if request.IsErrorThrottle(err) ||
awsErrHasCode(awsErr, "ThrottlingException") ||
awsErrHasCode(awsErr, "RequestThrottledException") ||
awsErrHasCode(awsErr, "Throttling") ||
awsErrHasCode(awsErr, "TooManyRequestsException") ||
awsErrHasCode(awsErr, "RequestLimitExceeded") {
// Calculate exponential backoff with jitter
backoffTime := time.Duration(math.Pow(2, float64(attempt))) * time.Second
rnd := rand.New(rand.NewSource(time.Now().UnixNano()))
jitter := time.Duration(rnd.Intn(1000)) * time.Millisecond
sleepTime := backoffTime + jitter
log.Infof("AWS API rate limit hit, retrying in %v (attempt %d/%d)", sleepTime, attempt+1, maxRetries)
time.Sleep(sleepTime)
continue
}
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: fmt.Sprintf("failed to get instance information: %v", common.CheckAWSError(err).Error()),
Reason: fmt.Sprintf("failed to get instance information: %v", awsErr.Error()),
Target: fmt.Sprintf("{Region: %v}", experimentsDetails.Region),
}
}
break
}
if err != nil {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: fmt.Sprintf("failed to get instance information after retries: %v", common.CheckAWSError(err).Error()),
Target: fmt.Sprintf("{Region: %v}", experimentsDetails.Region),
}
}
// Validate that each target instance is present.
for _, ec2ID := range instanceIDList {
if _, exists := foundInstances[ec2ID]; !exists {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: fmt.Sprintf("the instance %v might not have suitable permission or IAM attached to it. Run command `aws ssm describe-instance-information` to check for available instances", ec2ID),
Target: fmt.Sprintf("{EC2 Instance ID: %v, Region: %v}", ec2ID, experimentsDetails.Region),
}
}
isInstanceFound := false
if len(res.InstanceInformationList) != 0 {
for _, instanceDetails := range res.InstanceInformationList {
if *instanceDetails.InstanceId == ec2ID {
isInstanceFound = true
break
}
}
if !isInstanceFound {
return cerrors.Error{
ErrorCode: cerrors.ErrorTypeChaosInject,
Reason: fmt.Sprintf("the instance %v might not have suitable permission or IAM attached to it. Run command `aws ssm describe-instance-information` to check for available instances", ec2ID),
Target: fmt.Sprintf("{EC2 Instance ID: %v, Region: %v}", ec2ID, experimentsDetails.Region),
}
}
}
}
log.Info("[Info]: The target instance have permission to perform SSM API calls")
return nil
}

View File

@ -37,6 +37,7 @@ func GetENV(experimentDetails *experimentTypes.ExperimentDetails, expName string
experimentDetails.SetHelperData = types.Getenv("SET_HELPER_DATA", "true")
experimentDetails.SourcePorts = types.Getenv("SOURCE_PORTS", "")
experimentDetails.DestinationPorts = types.Getenv("DESTINATION_PORTS", "")
experimentDetails.Correlation, _ = strconv.Atoi(types.Getenv("CORRELATION", "0"))
switch expName {
case "pod-network-loss":
@ -55,5 +56,13 @@ func GetENV(experimentDetails *experimentTypes.ExperimentDetails, expName string
case "pod-network-duplication":
experimentDetails.NetworkPacketDuplicationPercentage = types.Getenv("NETWORK_PACKET_DUPLICATION_PERCENTAGE", "100")
experimentDetails.NetworkChaosType = "network-duplication"
case "pod-network-rate-limit":
experimentDetails.NetworkBandwidth = types.Getenv("NETWORK_BANDWIDTH", "1mbit")
experimentDetails.Burst = types.Getenv("BURST", "32kb")
experimentDetails.Limit = types.Getenv("LIMIT", "2mb")
experimentDetails.MinBurst = types.Getenv("MIN_BURST", "")
experimentDetails.PeakRate = types.Getenv("PEAK_RATE", "")
experimentDetails.NetworkChaosType = "network-rate-limit"
}
}

View File

@ -26,6 +26,7 @@ type ExperimentDetails struct {
NetworkLatency int
NetworkPacketLossPercentage string
NetworkPacketCorruptionPercentage string
Correlation int
Timeout int
Delay int
TargetPods string
@ -44,4 +45,9 @@ type ExperimentDetails struct {
SetHelperData string
SourcePorts string
DestinationPorts string
NetworkBandwidth string
Burst string
Limit string
PeakRate string
MinBurst string
}

View File

@ -1,7 +1,6 @@
package kafka
import (
"context"
"fmt"
"strconv"
"strings"
@ -61,7 +60,7 @@ func LivenessStream(experimentsDetails *experimentTypes.ExperimentDetails, clien
}
log.Info("[Liveness]: Determine the leader broker pod name")
podList, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.KafkaNamespace).List(context.Background(), metav1.ListOptions{LabelSelector: experimentsDetails.KafkaLabel})
podList, err := clients.ListPods(experimentsDetails.KafkaNamespace, experimentsDetails.KafkaLabel)
if err != nil {
return "", cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Reason: fmt.Sprintf("unable to find the pods with matching labels, err: %v", err)}
}
@ -184,9 +183,9 @@ func CreateLivenessPod(experimentsDetails *experimentTypes.ExperimentDetails, Ka
},
}
_, err := clients.KubeClient.CoreV1().Pods(experimentsDetails.KafkaNamespace).Create(context.Background(), LivenessPod, metav1.CreateOptions{})
if err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeStatusChecks, Reason: fmt.Sprintf("unable to create liveness pod, err: %v", err)}
if err := clients.CreatePod(experimentsDetails.KafkaNamespace, LivenessPod); err != nil {
return cerrors.Error{ErrorCode: cerrors.ErrorTypeGeneric, Reason: fmt.Sprintf("unable to create liveness pod, err: %v", err)}
}
return nil
}

View File

@ -0,0 +1,29 @@
package environment
import (
"strconv"
clientTypes "k8s.io/apimachinery/pkg/types"
experimentTypes "github.com/litmuschaos/litmus-go/pkg/kube-aws/rds-instance-stop/types"
"github.com/litmuschaos/litmus-go/pkg/types"
)
// GetENV fetches all the env variables from the runner pod
func GetENV(experimentDetails *experimentTypes.ExperimentDetails) {
experimentDetails.ExperimentName = types.Getenv("EXPERIMENT_NAME", "rds-instance-stop")
experimentDetails.ChaosNamespace = types.Getenv("CHAOS_NAMESPACE", "litmus")
experimentDetails.EngineName = types.Getenv("CHAOSENGINE", "")
experimentDetails.ChaosDuration, _ = strconv.Atoi(types.Getenv("TOTAL_CHAOS_DURATION", "30"))
experimentDetails.ChaosInterval, _ = strconv.Atoi(types.Getenv("CHAOS_INTERVAL", "30"))
experimentDetails.RampTime, _ = strconv.Atoi(types.Getenv("RAMP_TIME", "0"))
experimentDetails.ChaosUID = clientTypes.UID(types.Getenv("CHAOS_UID", ""))
experimentDetails.InstanceID = types.Getenv("INSTANCE_ID", "")
experimentDetails.ChaosPodName = types.Getenv("POD_NAME", "")
experimentDetails.Delay, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_DELAY", "2"))
experimentDetails.Timeout, _ = strconv.Atoi(types.Getenv("STATUS_CHECK_TIMEOUT", "600"))
experimentDetails.RDSInstanceIdentifier = types.Getenv("RDS_INSTANCE_IDENTIFIER", "")
experimentDetails.Region = types.Getenv("REGION", "")
experimentDetails.InstanceAffectedPerc, _ = strconv.Atoi(types.Getenv("INSTANCE_AFFECTED_PERC", "100"))
experimentDetails.Sequence = types.Getenv("SEQUENCE", "parallel")
}

Some files were not shown because too many files have changed in this diff Show More