Just a small clarification on the description of the responsibilities of a Job. Kubernetes doesn't ensure that the Pods are successful, but the scheduler's responsibility is to continue retrying until success. Updating the docs with a slight wording change to more accurately reflect that a Job won't ensure success, but it will continue retrying until success.
* Revise Pod concept
Adapt the existing Pod documentation to suit the Docsy theme, by
promoting the Pod concept itself to /docs/concepts/workloads/pods/
Following on from this, update the Pod Lifecycle page to cover the
lifecycle of a Pod and follow on directly from the Pod concept,
for readers keen to understand things in detail.
This change also removes the automatic contents list from the Pod
overview page. Instead, the new page links to all the pages
inside the Pod section.
* Update links to Pod concept
Link to updated content
* Incorporate Pod concept suggestions
Co-authored-by: Celeste Horgan <celeste@cncf.io>
* Revise StatefulSet suggestion for Pod concept
Co-authored-by: Celeste Horgan <celeste@cncf.io>
Co-authored-by: Celeste Horgan <celeste@cncf.io>
By carefully reading the code in `job_controller.go`, I finally understood that
the back-off count is reset when `forget` is true, which happens when `active`
or `successful` changes without any new failures right at the moment. That
happens in this code:
dd649bb7ef/pkg/controller/job/job_controller.go (L588)
That behavior does not match what this document says. My change fixes the doc
to match the code.
It might be better to fix the behavior to match the doc, since the behavior is
kind of weird to describe. But I imagine that the Kubernetes team will need to
consider factors I'm not aware of before deciding to change job back-off
behavior, so I am not going to the effort of proposing a change like that.