Fix description of back-off count reset

By carefully reading the code in `job_controller.go`, I finally understood that
the back-off count is reset when `forget` is true, which happens when `active`
or `successful` changes without any new failures right at the moment. That
happens in this code:

dd649bb7ef/pkg/controller/job/job_controller.go (L588)

That behavior does not match what this document says. My change fixes the doc
to match the code.

It might be better to fix the behavior to match the doc, since the behavior is
kind of weird to describe. But I imagine that the Kubernetes team will need to
consider factors I'm not aware of before deciding to change job back-off
behavior, so I am not going to the effort of proposing a change like that.
This commit is contained in:
Leon Barrett 2020-07-10 14:09:29 -07:00 committed by Leon Barrett
parent 6bf2feba74
commit 7b92c46503
1 changed files with 2 additions and 2 deletions

View File

@ -215,8 +215,8 @@ To do so, set `.spec.backoffLimit` to specify the number of retries before
considering a Job as failed. The back-off limit is set by default to 6. Failed
Pods associated with the Job are recreated by the Job controller with an
exponential back-off delay (10s, 20s, 40s ...) capped at six minutes. The
back-off count is reset if no new failed Pods appear before the Job's next
status check.
back-off count is reset when a job pod is deleted or successful without any
other pods failing around that time.
{{< note >}}
If your job has `restartPolicy = "OnFailure"`, keep in mind that your container running the Job