From f1fe316eec07edb08b9e2cf7d9b6c9a0ec11d473 Mon Sep 17 00:00:00 2001 From: Kevin Hannon Date: Fri, 23 Jun 2023 09:56:44 -0400 Subject: [PATCH 1/4] add a section on podrecreationpolicy for jobs --- .../concepts/workloads/controllers/job.md | 50 +++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/content/en/docs/concepts/workloads/controllers/job.md b/content/en/docs/concepts/workloads/controllers/job.md index be6a66c9db..fd6572ba75 100644 --- a/content/en/docs/concepts/workloads/controllers/job.md +++ b/content/en/docs/concepts/workloads/controllers/job.md @@ -457,6 +457,12 @@ Since Kubernetes 1.27, Kubelet transitions deleted pods to a terminal phase ensures that deleted pods have their finalizers removed by the Job controller. {{< /note >}} +{{< note >}} +Since Kubernetes v1.28, when pod failure policy is used, the Job controller recreates +terminating pods only once they reach the terminal `Failed` phase. This behavior is analogous +to when using `podRecreationPolicy: Failed`, see [pod replacement policy](#pod-replacement-policy) for more details. +{{< /note >}} + ## Job termination and cleanup When a Job completes, no more Pods are created, but the Pods are [usually](#pod-backoff-failure-policy) not deleted either. @@ -867,6 +873,50 @@ is disabled, `.spec.completions` is immutable. Use cases for elastic Indexed Jobs include batch workloads which require scaling an indexed Job, such as MPI, Horovord, Ray, and PyTorch training jobs. +### Pod Replacement Policy + +{{< feature-state for_k8s_version="v1.28" state="alpha" >}} + +{{< note >}} +You can only enable `PodReplacementPolicy` for Jobs if you set `JobPodReplacementPolicy`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to true. +{{< /note >}} + +By default, the Job controller recreates pods as soon they are either failed or terminating (have a deletion timestamp). +This means that, at a given time, the number of running Pods for the Jobs can be greater than `parallelism` or, if using Indexed Jobs, more than one running Pod per index, if some of the Pods are terminating. + +You may choose to create replacement pods only when the terminating pod is fully terminal (has `status.phase: Failed`). To do this, set the `.spec.podReplacementPolicy: Failed`. +This will only recreate pods once they are terminated. +Default behavior will be to recreate upon deletion (`DeletionTimestamp != nil`). + +```yaml +kind: Job +metadata: + name: new + ... +spec: + podReplacementPolicy: Failed + ... +``` + +You can inspect a new field in the JobStatus called `terminating`. +This will report the number pods that are currently terminating and is easily viewable in the status. + +```shell +kubectl get jobs/myjob -o yaml +``` + +```yaml +apiVersion: batch/v1 +kind: Job +# .metadata and .spec omitted +status: + terminating: 1 # if pod is terminating +``` + +When [PodFailurePolicy](#pod-failure-policy) is enabled, a Job will have a default (and only this value is allowed) value of `Failed`. +If `JobPodReplacementPolicy` is disabled and `podFailurePolicy` is enabled, a Job will wait for terminating pods to be fully terminated before marking the pod as failed. +In this case, you will not be able to inspect the `terminating` field. + ## Alternatives ### Bare Pods From db4ba0632d48a37082d5396541635cfa15baee0a Mon Sep 17 00:00:00 2001 From: Kevin Hannon Date: Thu, 27 Jul 2023 10:24:42 -0400 Subject: [PATCH 2/4] add feature toggle for podreplacementpolicy --- .../reference/command-line-tools-reference/feature-gates.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 9d59376647..a2ced7aca1 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -120,6 +120,7 @@ For a reference to old feature gates that are removed, please refer to | `InTreePluginvSphereUnregister` | `false` | Alpha | 1.21 | | | `JobPodFailurePolicy` | `false` | Alpha | 1.25 | 1.25 | | `JobPodFailurePolicy` | `true` | Beta | 1.26 | | +| `JobPodReplacementPolicy` | `false` | Alpha | 1.28 | | | `JobReadyPods` | `false` | Alpha | 1.23 | 1.23 | | `JobReadyPods` | `true` | Beta | 1.24 | | | `KMSv2` | `false` | Alpha | 1.25 | 1.26 | @@ -551,6 +552,7 @@ Each feature gate is designed for enabling/disabling a specific feature: the pod template of [Job](/docs/concepts/workloads/controllers/job). - `JobPodFailurePolicy`: Allow users to specify handling of pod failures based on container exit codes and pod conditions. +- `JobPodReplacementPolicy`: Allows users to specify pod replacement for terminating pods in a [Job](/docs/concepts/workloads/controllers/job) - `JobReadyPods`: Enables tracking the number of Pods that have a `Ready` [condition](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions). The count of `Ready` pods is recorded in the From cd0de2832aa65ee974eed8ff4eb6af2c1f815bf9 Mon Sep 17 00:00:00 2001 From: Kevin Hannon Date: Thu, 27 Jul 2023 10:34:26 -0400 Subject: [PATCH 3/4] address comments --- content/en/docs/concepts/workloads/controllers/job.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/en/docs/concepts/workloads/controllers/job.md b/content/en/docs/concepts/workloads/controllers/job.md index fd6572ba75..2075380420 100644 --- a/content/en/docs/concepts/workloads/controllers/job.md +++ b/content/en/docs/concepts/workloads/controllers/job.md @@ -878,7 +878,7 @@ scaling an indexed Job, such as MPI, Horovord, Ray, and PyTorch training jobs. {{< feature-state for_k8s_version="v1.28" state="alpha" >}} {{< note >}} -You can only enable `PodReplacementPolicy` for Jobs if you set `JobPodReplacementPolicy`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to true. +You can only set `podReplacementPolicy` on Jobs if you enable the `JobPodReplacementPolicy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/). {{< /note >}} By default, the Job controller recreates pods as soon they are either failed or terminating (have a deletion timestamp). @@ -886,7 +886,7 @@ This means that, at a given time, the number of running Pods for the Jobs can be You may choose to create replacement pods only when the terminating pod is fully terminal (has `status.phase: Failed`). To do this, set the `.spec.podReplacementPolicy: Failed`. This will only recreate pods once they are terminated. -Default behavior will be to recreate upon deletion (`DeletionTimestamp != nil`). +The default policy is `FailedOrTerminating`, meaning that the control plane creates replacement Pods upon deletion (`DeletionTimestamp != nil`). ```yaml kind: Job @@ -913,7 +913,7 @@ status: terminating: 1 # if pod is terminating ``` -When [PodFailurePolicy](#pod-failure-policy) is enabled, a Job will have a default (and only this value is allowed) value of `Failed`. +When you use a [podFailurePolicy](#pod-failure-policy) in a Job, the Job will have a default `podReplacementPolicy` value of `Failed`, and this is the only policy allowed. If `JobPodReplacementPolicy` is disabled and `podFailurePolicy` is enabled, a Job will wait for terminating pods to be fully terminated before marking the pod as failed. In this case, you will not be able to inspect the `terminating` field. From 4afba1c60990420016905a930ba9f7156c1b71a8 Mon Sep 17 00:00:00 2001 From: Kevin Hannon Date: Mon, 31 Jul 2023 14:15:34 -0400 Subject: [PATCH 4/4] add suggestions --- .../concepts/workloads/controllers/job.md | 35 ++++++++++--------- .../feature-gates.md | 2 +- 2 files changed, 20 insertions(+), 17 deletions(-) diff --git a/content/en/docs/concepts/workloads/controllers/job.md b/content/en/docs/concepts/workloads/controllers/job.md index 2075380420..eb303741cd 100644 --- a/content/en/docs/concepts/workloads/controllers/job.md +++ b/content/en/docs/concepts/workloads/controllers/job.md @@ -458,9 +458,9 @@ ensures that deleted pods have their finalizers removed by the Job controller. {{< /note >}} {{< note >}} -Since Kubernetes v1.28, when pod failure policy is used, the Job controller recreates -terminating pods only once they reach the terminal `Failed` phase. This behavior is analogous -to when using `podRecreationPolicy: Failed`, see [pod replacement policy](#pod-replacement-policy) for more details. +Starting with Kubernetes v1.28, when Pod failure policy is used, the Job controller recreates +terminating Pods only once these Pods reach the terminal `Failed` phase. This behavior is similar +to `podRecreationPolicy: Failed`. For more information, see [Pod replacement policy](#pod-replacement-policy). {{< /note >}} ## Job termination and cleanup @@ -873,7 +873,7 @@ is disabled, `.spec.completions` is immutable. Use cases for elastic Indexed Jobs include batch workloads which require scaling an indexed Job, such as MPI, Horovord, Ray, and PyTorch training jobs. -### Pod Replacement Policy +### Delayed creation of replacement pods {{< feature-state for_k8s_version="v1.28" state="alpha" >}} @@ -881,12 +881,19 @@ scaling an indexed Job, such as MPI, Horovord, Ray, and PyTorch training jobs. You can only set `podReplacementPolicy` on Jobs if you enable the `JobPodReplacementPolicy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/). {{< /note >}} -By default, the Job controller recreates pods as soon they are either failed or terminating (have a deletion timestamp). -This means that, at a given time, the number of running Pods for the Jobs can be greater than `parallelism` or, if using Indexed Jobs, more than one running Pod per index, if some of the Pods are terminating. +By default, the Job controller recreates Pods as soon they either fail or are terminating (have a deletion timestamp). +This means that, at a given time, when some of the Pods are terminating, the number of running Pods for the Jobs can be greater than `parallelism` or greater than one Pod per index (if using Indexed Jobs). -You may choose to create replacement pods only when the terminating pod is fully terminal (has `status.phase: Failed`). To do this, set the `.spec.podReplacementPolicy: Failed`. -This will only recreate pods once they are terminated. -The default policy is `FailedOrTerminating`, meaning that the control plane creates replacement Pods upon deletion (`DeletionTimestamp != nil`). +You may choose to create replacement Pods only when the terminating Pod is fully terminal (has `status.phase: Failed`). To do this, set the `.spec.podReplacementPolicy: Failed`. +This will only recreate Pods once they are terminated. +The default replacement policy depends on whether the Job has a `podFailurePolicy` set. +With no Pod failure policy defined for a Job, omitting the `podReplacementPolicy` field selects the +`FailedOrTerminating` replacement policy: +the control plane creates replacement Pods immediately upon Pod deletion +(as soon as the control plane sees that a Pod for this Job has `deletionTimestamp` set). +For Jobs with a Pod failure policy set, the default `podReplacementPolicy` is `Failed`, and no other +value is permitted. +See [Pod failure policy](#pod-failure-policy) to learn more about Pod failure policies for Jobs. ```yaml kind: Job @@ -898,8 +905,8 @@ spec: ... ``` -You can inspect a new field in the JobStatus called `terminating`. -This will report the number pods that are currently terminating and is easily viewable in the status. +Provided your cluster has the feature gate enabled, you can inspect the `.status.terminating` field of a Job. +The value of the field is the number of Pods owned by the Job that are currently terminating. ```shell kubectl get jobs/myjob -o yaml @@ -910,13 +917,9 @@ apiVersion: batch/v1 kind: Job # .metadata and .spec omitted status: - terminating: 1 # if pod is terminating + terminating: 3 # three Pods are terminating and have not yet reached the Failed phase ``` -When you use a [podFailurePolicy](#pod-failure-policy) in a Job, the Job will have a default `podReplacementPolicy` value of `Failed`, and this is the only policy allowed. -If `JobPodReplacementPolicy` is disabled and `podFailurePolicy` is enabled, a Job will wait for terminating pods to be fully terminated before marking the pod as failed. -In this case, you will not be able to inspect the `terminating` field. - ## Alternatives ### Bare Pods diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index a2ced7aca1..cf720e4070 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -552,7 +552,7 @@ Each feature gate is designed for enabling/disabling a specific feature: the pod template of [Job](/docs/concepts/workloads/controllers/job). - `JobPodFailurePolicy`: Allow users to specify handling of pod failures based on container exit codes and pod conditions. -- `JobPodReplacementPolicy`: Allows users to specify pod replacement for terminating pods in a [Job](/docs/concepts/workloads/controllers/job) +- `JobPodReplacementPolicy`: Allows you to specify pod replacement for terminating pods in a [Job](/docs/concepts/workloads/controllers/job) - `JobReadyPods`: Enables tracking the number of Pods that have a `Ready` [condition](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions). The count of `Ready` pods is recorded in the