mpi-operator/sdk/python/v2beta1/docs/V2beta1RunPolicy.md

2.3 KiB

V2beta1RunPolicy

RunPolicy encapsulates various runtime policies of the distributed training job, for example how to clean up resources and how long the job can stay active.

Properties

Name Type Description Notes
active_deadline_seconds int Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer. [optional]
backoff_limit int Optional number of retries before marking this job failed. [optional]
clean_pod_policy str CleanPodPolicy defines the policy to kill pods after the job completes. Default to Running. [optional]
managed_by str ManagedBy is used to indicate the controller or entity that manages a MPIJob. The value must be either empty, 'kubeflow.org/mpi-operator' or 'kueue.x-k8s.io/multikueue'. The mpi-operator reconciles a MPIJob which doesn't have this field at all or the field value is the reserved string 'kubeflow.org/mpi-operator', but delegates reconciling the MPIJob with 'kueue.x-k8s.io/multikueue' to the Kueue. The field is immutable. [optional]
scheduling_policy V2beta1SchedulingPolicy [optional]
suspend bool suspend specifies whether the MPIJob controller should create Pods or not. If a MPIJob is created with suspend set to true, no Pods are created by the MPIJob controller. If a MPIJob is suspended after creation (i.e. the flag goes from false to true), the MPIJob controller will delete all active Pods and PodGroups associated with this MPIJob. Also, it will suspend the Launcher Job. Users must design their workload to gracefully handle this. Suspending a Job will reset the StartTime field of the MPIJob. Defaults to false. [optional]
ttl_seconds_after_finished int TTLSecondsAfterFinished is the TTL to clean up jobs. It may take extra ReconcilePeriod seconds for the cleanup, since reconcile gets called periodically. Default to infinite. [optional]

[Back to Model list] [Back to API list] [Back to README]