Merge pull request #26466 from adtac/suspend-1.21

job.md: add section on suspended jobs
This commit is contained in:
Kubernetes Prow Robot 2021-03-24 09:26:08 -07:00 committed by GitHub
commit 6adc893ffa
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 107 additions and 1 deletions

View File

@ -16,7 +16,8 @@ weight: 50
A Job creates one or more Pods and will continue to retry execution of the Pods until a specified number of them successfully terminate.
As pods successfully complete, the Job tracks the successful completions. When a specified number
of successful completions is reached, the task (ie, Job) is complete. Deleting a Job will clean up
the Pods it created.
the Pods it created. Suspending a Job will delete its active Pods until the Job
is resumed again.
A simple case is to create one Job object in order to reliably run one Pod to completion.
The Job object will start a new Pod if the first Pod fails or is deleted (for example
@ -404,6 +405,107 @@ Here, `W` is the number of work items.
## Advanced usage
### Suspending a Job
{{< feature-state for_k8s_version="v1.21" state="alpha" >}}
{{< note >}}
Suspending Jobs is available in Kubernetes versions 1.21 and above. You must
enable the `SuspendJob` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
on the [API server](docs/reference/command-line-tools-reference/kube-apiserver/)
and the [controller manager](/docs/reference/command-line-tools-reference/kube-controller-manager/)
in order to use this feature.
{{< /note >}}
When a Job is created, the Job controller will immediately begin creating Pods
to satisfy the Job's requirements and will continue to do so until the Job is
complete. However, you may want to temporarily suspend a Job's execution and
resume it later. To suspend a Job, you can update the `.spec.suspend` field of
the Job to true; later, when you want to resume it again, update it to false.
Creating a Job with `.spec.suspend` set to true will create it in the suspended
state.
When a Job is resumed from suspension, its `.status.startTime` field will be
reset to the current time. This means that the `.spec.activeDeadlineSeconds`
timer will be stopped and reset when a Job is suspended and resumed.
Remember that suspending a Job will delete all active Pods. When the Job is
suspended, your [Pods will be terminated](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)
with a SIGTERM signal. The Pod's graceful termination period will be honored and
your Pod must handle this signal in this period. This may involve saving
progress for later or undoing changes. Pods terminated this way will not count
towards the Job's `completions` count.
An example Job definition in the suspended state can be like so:
```shell
kubectl get job myjob -o yaml
```
```yaml
apiVersion: batch/v1
kind: Job
metadata:
name: myjob
spec:
suspend: true
parallelism: 1
completions: 5
template:
spec:
...
```
The Job's status can be used to determine if a Job is suspended or has been
suspended in the past:
```shell
kubectl get jobs/myjob -o yaml
```
```json
apiVersion: batch/v1
kind: Job
# .metadata and .spec omitted
status:
conditions:
- lastProbeTime: "2021-02-05T13:14:33Z"
lastTransitionTime: "2021-02-05T13:14:33Z"
status: "True"
type: Suspended
startTime: "2021-02-05T13:13:48Z"
```
The Job condition of type "Suspended" with status "True" means the Job is
suspended; the `lastTransitionTime` field can be used to determine how long the
Job has been suspended for. If the status of that condition is "False", then the
Job was previously suspended and is now running. If such a condition does not
exist in the Job's status, the Job has never been stopped.
Events are also created when the Job is suspended and resumed:
```shell
kubectl describe jobs/myjob
```
```
Name: myjob
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 12m job-controller Created pod: myjob-hlrpl
Normal SuccessfulDelete 11m job-controller Deleted pod: myjob-hlrpl
Normal Suspended 11m job-controller Job suspended
Normal SuccessfulCreate 3s job-controller Created pod: myjob-jvb44
Normal Resumed 3s job-controller Job resumed
```
The last four events, particularly the "Suspended" and "Resumed" events, are
directly a result of toggling the `.spec.suspend` field. In the time between
these two events, we see that no Pods were created, but Pod creation restarted
as soon as the Job was resumed.
### Specifying your own Pod selector
Normally, when you create a Job object, you do not specify `.spec.selector`.

View File

@ -170,6 +170,7 @@ different Kubernetes components.
| `StorageVersionAPI` | `false` | Alpha | 1.20 | |
| `StorageVersionHash` | `false` | Alpha | 1.14 | 1.14 |
| `StorageVersionHash` | `true` | Beta | 1.15 | |
| `SuspendJob` | `false` | Alpha | 1.21 | |
| `Sysctls` | `true` | Beta | 1.11 | |
| `TTLAfterFinished` | `false` | Alpha | 1.12 | |
| `TopologyManager` | `false` | Alpha | 1.16 | 1.17 |
@ -775,6 +776,9 @@ Each feature gate is designed for enabling/disabling a specific feature:
options can be specified to ensure that the specified number of process IDs
will be reserved for the system as a whole and for Kubernetes system daemons
respectively.
- `SuspendJob`: Enable support to suspend and resume Jobs. See
[the Jobs docs](/docs/concepts/workloads/controllers/job/) for
more details.
- `Sysctls`: Enable support for namespaced kernel parameters (sysctls) that can be
set for each pod. See
[sysctls](/docs/tasks/administer-cluster/sysctl-cluster/) for more details.