Merge pull request #33536 from alculquicondor/job-failures

Accurate explanation for the calculation of number of failures in Job
This commit is contained in:
Kubernetes Prow Robot 2022-06-15 21:42:48 -07:00 committed by GitHub
commit b6e815beb4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 14 additions and 4 deletions

View File

@ -253,9 +253,19 @@ due to a logical error in configuration etc.
To do so, set `.spec.backoffLimit` to specify the number of retries before To do so, set `.spec.backoffLimit` to specify the number of retries before
considering a Job as failed. The back-off limit is set by default to 6. Failed considering a Job as failed. The back-off limit is set by default to 6. Failed
Pods associated with the Job are recreated by the Job controller with an Pods associated with the Job are recreated by the Job controller with an
exponential back-off delay (10s, 20s, 40s ...) capped at six minutes. The exponential back-off delay (10s, 20s, 40s ...) capped at six minutes.
back-off count is reset when a Job's Pod is deleted or successful without any
other Pods for the Job failing around that time. The number of retries is calculated in two ways:
- The number of Pods with `.status.phase = "Failed"`.
- When using `restartPolicy = "OnFailure"`, the number of retries in all the
containers of Pods with `.status.phase` equal to `Pending` or `Running`.
If either of the calculations reaches the `.spec.backoffLimit`, the Job is
considered failed.
When the [`JobTrackingWithFinalizers`](#job-tracking-with-finalizers) feature is
disabled, the number of failed Pods is only based on Pods that are still present
in the API.
{{< note >}} {{< note >}}
If your job has `restartPolicy = "OnFailure"`, keep in mind that your Pod running the Job If your job has `restartPolicy = "OnFailure"`, keep in mind that your Pod running the Job