diff --git a/content/en/docs/concepts/workloads/controllers/job.md b/content/en/docs/concepts/workloads/controllers/job.md index d5f505473a..97a865fc22 100644 --- a/content/en/docs/concepts/workloads/controllers/job.md +++ b/content/en/docs/concepts/workloads/controllers/job.md @@ -16,7 +16,8 @@ weight: 50 A Job creates one or more Pods and will continue to retry execution of the Pods until a specified number of them successfully terminate. As pods successfully complete, the Job tracks the successful completions. When a specified number of successful completions is reached, the task (ie, Job) is complete. Deleting a Job will clean up -the Pods it created. +the Pods it created. Suspending a Job will delete its active Pods until the Job +is resumed again. A simple case is to create one Job object in order to reliably run one Pod to completion. The Job object will start a new Pod if the first Pod fails or is deleted (for example @@ -404,6 +405,107 @@ Here, `W` is the number of work items. ## Advanced usage +### Suspending a Job + +{{< feature-state for_k8s_version="v1.21" state="alpha" >}} + +{{< note >}} +Suspending Jobs is available in Kubernetes versions 1.21 and above. You must +enable the `SuspendJob` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) +on the [API server](docs/reference/command-line-tools-reference/kube-apiserver/) +and the [controller manager](/docs/reference/command-line-tools-reference/kube-controller-manager/) +in order to use this feature. +{{< /note >}} + +When a Job is created, the Job controller will immediately begin creating Pods +to satisfy the Job's requirements and will continue to do so until the Job is +complete. However, you may want to temporarily suspend a Job's execution and +resume it later. To suspend a Job, you can update the `.spec.suspend` field of +the Job to true; later, when you want to resume it again, update it to false. +Creating a Job with `.spec.suspend` set to true will create it in the suspended +state. + +When a Job is resumed from suspension, its `.status.startTime` field will be +reset to the current time. This means that the `.spec.activeDeadlineSeconds` +timer will be stopped and reset when a Job is suspended and resumed. + +Remember that suspending a Job will delete all active Pods. When the Job is +suspended, your [Pods will be terminated](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination) +with a SIGTERM signal. The Pod's graceful termination period will be honored and +your Pod must handle this signal in this period. This may involve saving +progress for later or undoing changes. Pods terminated this way will not count +towards the Job's `completions` count. + +An example Job definition in the suspended state can be like so: + +```shell +kubectl get job myjob -o yaml +``` + +```yaml +apiVersion: batch/v1 +kind: Job +metadata: + name: myjob +spec: + suspend: true + parallelism: 1 + completions: 5 + template: + spec: + ... +``` + +The Job's status can be used to determine if a Job is suspended or has been +suspended in the past: + +```shell +kubectl get jobs/myjob -o yaml +``` + +```json +apiVersion: batch/v1 +kind: Job +# .metadata and .spec omitted +status: + conditions: + - lastProbeTime: "2021-02-05T13:14:33Z" + lastTransitionTime: "2021-02-05T13:14:33Z" + status: "True" + type: Suspended + startTime: "2021-02-05T13:13:48Z" +``` + +The Job condition of type "Suspended" with status "True" means the Job is +suspended; the `lastTransitionTime` field can be used to determine how long the +Job has been suspended for. If the status of that condition is "False", then the +Job was previously suspended and is now running. If such a condition does not +exist in the Job's status, the Job has never been stopped. + +Events are also created when the Job is suspended and resumed: + +```shell +kubectl describe jobs/myjob +``` + +``` +Name: myjob +... +Events: + Type Reason Age From Message + ---- ------ ---- ---- ------- + Normal SuccessfulCreate 12m job-controller Created pod: myjob-hlrpl + Normal SuccessfulDelete 11m job-controller Deleted pod: myjob-hlrpl + Normal Suspended 11m job-controller Job suspended + Normal SuccessfulCreate 3s job-controller Created pod: myjob-jvb44 + Normal Resumed 3s job-controller Job resumed +``` + +The last four events, particularly the "Suspended" and "Resumed" events, are +directly a result of toggling the `.spec.suspend` field. In the time between +these two events, we see that no Pods were created, but Pod creation restarted +as soon as the Job was resumed. + ### Specifying your own Pod selector Normally, when you create a Job object, you do not specify `.spec.selector`. diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index 2c181ca331..e03c973e1c 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -170,6 +170,7 @@ different Kubernetes components. | `StorageVersionAPI` | `false` | Alpha | 1.20 | | | `StorageVersionHash` | `false` | Alpha | 1.14 | 1.14 | | `StorageVersionHash` | `true` | Beta | 1.15 | | +| `SuspendJob` | `false` | Alpha | 1.21 | | | `Sysctls` | `true` | Beta | 1.11 | | | `TTLAfterFinished` | `false` | Alpha | 1.12 | | | `TopologyManager` | `false` | Alpha | 1.16 | 1.17 | @@ -775,6 +776,9 @@ Each feature gate is designed for enabling/disabling a specific feature: options can be specified to ensure that the specified number of process IDs will be reserved for the system as a whole and for Kubernetes system daemons respectively. +- `SuspendJob`: Enable support to suspend and resume Jobs. See + [the Jobs docs](/docs/concepts/workloads/controllers/job/) for + more details. - `Sysctls`: Enable support for namespaced kernel parameters (sysctls) that can be set for each pod. See [sysctls](/docs/tasks/administer-cluster/sysctl-cluster/) for more details.