KEP-3998: move section to before Job termination and cleanup
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
This commit is contained in:
parent
92a00327bb
commit
105d90a04b
|
|
@ -550,6 +550,63 @@ terminating Pods only once these Pods reach the terminal `Failed` phase. This be
|
||||||
to `podReplacementPolicy: Failed`. For more information, see [Pod replacement policy](#pod-replacement-policy).
|
to `podReplacementPolicy: Failed`. For more information, see [Pod replacement policy](#pod-replacement-policy).
|
||||||
{{< /note >}}
|
{{< /note >}}
|
||||||
|
|
||||||
|
## Success policy {#success-policy}
|
||||||
|
|
||||||
|
{{< feature-state feature_gate_name="JobSuccessPolicy" >}}
|
||||||
|
|
||||||
|
{{< note >}}
|
||||||
|
You can only configure a success policy for an Indexed Job if you have the
|
||||||
|
`JobSuccessPolicy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||||
|
enabled in your cluster.
|
||||||
|
{{< /note >}}
|
||||||
|
|
||||||
|
When you run an indexed Job, a success policy defined with the `spec.successPolicy` field,
|
||||||
|
allows you to define when a Job can be declared as succeeded based on the number of succeeded pods.
|
||||||
|
|
||||||
|
In some situations, you may want to have a better control when handling Pod
|
||||||
|
successes than the control provided by the `.spec.completins`.
|
||||||
|
There are some examples of use cases:
|
||||||
|
|
||||||
|
* To optimize costs of running workloads by avoiding unnecessary Pod running,
|
||||||
|
you can terminate a Job as soon as one of its Pods succeeds.
|
||||||
|
* To care only about a leader index in determining the success or failure of a Job
|
||||||
|
in a batch workloads such as MPI and PyTorch etc.
|
||||||
|
|
||||||
|
You can configure a success policy, in the `.spec.successPolicy` field,
|
||||||
|
to meet the above use cases. This policy can handle Job successes based on the
|
||||||
|
number of succeeded pods. After the Job meet success policy, the lingering Pods
|
||||||
|
are terminated by the Job controller.
|
||||||
|
|
||||||
|
When you specify the only `.spec.successPolicy.rules[*].succeededIndexes`,
|
||||||
|
once all indexes specified in the `succeededIndexes` succeeded, the Job is marked as succeeded.
|
||||||
|
The `succeededIndexes` must be a list within 0 to `.spec.completions-1` and
|
||||||
|
must not contain duplicate indexes. The `succeededIndexes` is represented as intervals separated by a hyphen.
|
||||||
|
The number are listed in represented by the first and last element of the series, separated by a hyphen.
|
||||||
|
For example, if you want to specify 1, 3, 4, 5 and 7, the `succeededIndexes` is represented as `1,3-5,7`.
|
||||||
|
|
||||||
|
When you specify the only `spec.successPolicy.rules[*].succeededCount`,
|
||||||
|
once the number of succeeded indexes reaches the `succeededCount`, the Job is marked as succeeded.
|
||||||
|
|
||||||
|
When you specify both `succeededIndexes` and `succeededCount`,
|
||||||
|
once the number of succeeded indexes specified in the `succeededIndexes` reaches the `succeededCount`,
|
||||||
|
the Job is marked as succeeded.
|
||||||
|
|
||||||
|
Note that when you specify multiple rules in the `.spec.succeessPolicy.rules`,
|
||||||
|
the rules are evaluated in order. Once the Job meets a rule, the remaining rules are ignored.
|
||||||
|
|
||||||
|
Here is a manifest for a Job with `successPolicy`:
|
||||||
|
|
||||||
|
{{% code_sample file="/controllers/job-success-policy-example.yaml" %}}
|
||||||
|
|
||||||
|
In the example above, the rule of the success policy specifies that
|
||||||
|
the Job should be marked succeeded and terminate the lingering Pods
|
||||||
|
if one of the 0, 1, and 2 indexes succeeded.
|
||||||
|
|
||||||
|
{{< note >}}
|
||||||
|
When you specify both a success policy and some terminating policies such as `.spec.backoffLimit` and `.spec.podFailurePolicy`,
|
||||||
|
once the Job meets both policies, the terminating policies are respected and a success policy is ignored.
|
||||||
|
{{< /note >}}
|
||||||
|
|
||||||
## Job termination and cleanup
|
## Job termination and cleanup
|
||||||
|
|
||||||
When a Job completes, no more Pods are created, but the Pods are [usually](#pod-backoff-failure-policy) not deleted either.
|
When a Job completes, no more Pods are created, but the Pods are [usually](#pod-backoff-failure-policy) not deleted either.
|
||||||
|
|
@ -1050,63 +1107,6 @@ after the operation: the built-in Job controller and the external controller
|
||||||
indicated by the field value.
|
indicated by the field value.
|
||||||
{{< /warning >}}
|
{{< /warning >}}
|
||||||
|
|
||||||
### Success policy {#success-policy}
|
|
||||||
|
|
||||||
{{< feature-state for_k8s_version="v1.29" state="alpha" >}}
|
|
||||||
|
|
||||||
{{< note >}}
|
|
||||||
You can only configure a success policy for an Indexed Job if you have the
|
|
||||||
`JobSuccessPolicy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
|
||||||
enabled in your cluster.
|
|
||||||
{{< /note >}}
|
|
||||||
|
|
||||||
When you run an indexed Job, a success policy defined with the `spec.successPolicy` field,
|
|
||||||
allows you to define when a Job can be declared as succeeded based on the number of succeeded pods.
|
|
||||||
|
|
||||||
In some situations, you may want to have a better control when handling Pod
|
|
||||||
successes than the control provided by the `.spec.completins`.
|
|
||||||
There are some examples of use cases:
|
|
||||||
|
|
||||||
* To optimize costs of running workloads by avoiding unnecessary Pod running,
|
|
||||||
you can terminate a Job as soon as one of its Pods succeeds.
|
|
||||||
* To care only about a leader index in determining the success or failure of a Job
|
|
||||||
in a batch workloads such as MPI and PyTorch etc.
|
|
||||||
|
|
||||||
You can configure a success policy, in the `.spec.successPolicy` field,
|
|
||||||
to meet the above use cases. This policy can handle Job successes based on the
|
|
||||||
number of succeeded pods. After the Job meet success policy, the lingering Pods
|
|
||||||
are terminated by the Job controller.
|
|
||||||
|
|
||||||
When you specify the only `.spec.successPolicy.rules[*].succeededIndexes`,
|
|
||||||
once all indexes specified in the `succeededIndexes` succeeded, the Job is marked as succeeded.
|
|
||||||
The `succeededIndexes` must be a list within 0 to `.spec.completions-1` and
|
|
||||||
must not contain duplicate indexes. The `succeededIndexes` is represented as intervals separated by a hyphen.
|
|
||||||
The number are listed in represented by the first and last element of the series, separated by a hyphen.
|
|
||||||
For example, if you want to specify 1, 3, 4, 5 and 7, the `succeededIndexes` is represented as `1,3-5,7`.
|
|
||||||
|
|
||||||
When you specify the only `spec.successPolicy.rules[*].succeededCount`,
|
|
||||||
once the number of succeeded indexes reaches the `succeededCount`, the Job is marked as succeeded.
|
|
||||||
|
|
||||||
When you specify both `succeededIndexes` and `succeededCount`,
|
|
||||||
once the number of succeeded indexes specified in the `succeededIndexes` reaches the `succeededCount`,
|
|
||||||
the Job is marked as succeeded.
|
|
||||||
|
|
||||||
Note that when you specify multiple rules in the `.spec.succeessPolicy.rules`,
|
|
||||||
the rules are evaluated in order. Once the Job meets a rule, the remaining rules are ignored.
|
|
||||||
|
|
||||||
Here is a manifest for a Job with `successPolicy`:
|
|
||||||
|
|
||||||
{{% code_sample file="/controllers/job-success-policy-example.yaml" %}}
|
|
||||||
|
|
||||||
In the example above, the rule of the success policy specifies that
|
|
||||||
the Job should be marked succeeded and terminate the lingering Pods
|
|
||||||
if one of the 0, 1, and 2 indexes succeeded.
|
|
||||||
|
|
||||||
{{< note >}}
|
|
||||||
When you specify both a success policy and some terminating policies such as `.spec.backoffLimit` and `.spec.podFailurePolicy`,
|
|
||||||
once the Job meets both policies, the terminating policies are respected and a success policy is ignored.
|
|
||||||
{{< /note >}}
|
|
||||||
|
|
||||||
## Alternatives
|
## Alternatives
|
||||||
|
|
||||||
### Bare Pods
|
### Bare Pods
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue