Merge pull request #33536 from alculquicondor/job-failures
Accurate explanation for the calculation of number of failures in Job
This commit is contained in:
commit
b6e815beb4
|
|
@ -253,9 +253,19 @@ due to a logical error in configuration etc.
|
||||||
To do so, set `.spec.backoffLimit` to specify the number of retries before
|
To do so, set `.spec.backoffLimit` to specify the number of retries before
|
||||||
considering a Job as failed. The back-off limit is set by default to 6. Failed
|
considering a Job as failed. The back-off limit is set by default to 6. Failed
|
||||||
Pods associated with the Job are recreated by the Job controller with an
|
Pods associated with the Job are recreated by the Job controller with an
|
||||||
exponential back-off delay (10s, 20s, 40s ...) capped at six minutes. The
|
exponential back-off delay (10s, 20s, 40s ...) capped at six minutes.
|
||||||
back-off count is reset when a Job's Pod is deleted or successful without any
|
|
||||||
other Pods for the Job failing around that time.
|
The number of retries is calculated in two ways:
|
||||||
|
- The number of Pods with `.status.phase = "Failed"`.
|
||||||
|
- When using `restartPolicy = "OnFailure"`, the number of retries in all the
|
||||||
|
containers of Pods with `.status.phase` equal to `Pending` or `Running`.
|
||||||
|
|
||||||
|
If either of the calculations reaches the `.spec.backoffLimit`, the Job is
|
||||||
|
considered failed.
|
||||||
|
|
||||||
|
When the [`JobTrackingWithFinalizers`](#job-tracking-with-finalizers) feature is
|
||||||
|
disabled, the number of failed Pods is only based on Pods that are still present
|
||||||
|
in the API.
|
||||||
|
|
||||||
{{< note >}}
|
{{< note >}}
|
||||||
If your job has `restartPolicy = "OnFailure"`, keep in mind that your Pod running the Job
|
If your job has `restartPolicy = "OnFailure"`, keep in mind that your Pod running the Job
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue