Merge pull request #26466 from adtac/suspend-1.21
job.md: add section on suspended jobs
This commit is contained in:
commit
6adc893ffa
|
@ -16,7 +16,8 @@ weight: 50
|
|||
A Job creates one or more Pods and will continue to retry execution of the Pods until a specified number of them successfully terminate.
|
||||
As pods successfully complete, the Job tracks the successful completions. When a specified number
|
||||
of successful completions is reached, the task (ie, Job) is complete. Deleting a Job will clean up
|
||||
the Pods it created.
|
||||
the Pods it created. Suspending a Job will delete its active Pods until the Job
|
||||
is resumed again.
|
||||
|
||||
A simple case is to create one Job object in order to reliably run one Pod to completion.
|
||||
The Job object will start a new Pod if the first Pod fails or is deleted (for example
|
||||
|
@ -404,6 +405,107 @@ Here, `W` is the number of work items.
|
|||
|
||||
## Advanced usage
|
||||
|
||||
### Suspending a Job
|
||||
|
||||
{{< feature-state for_k8s_version="v1.21" state="alpha" >}}
|
||||
|
||||
{{< note >}}
|
||||
Suspending Jobs is available in Kubernetes versions 1.21 and above. You must
|
||||
enable the `SuspendJob` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
on the [API server](docs/reference/command-line-tools-reference/kube-apiserver/)
|
||||
and the [controller manager](/docs/reference/command-line-tools-reference/kube-controller-manager/)
|
||||
in order to use this feature.
|
||||
{{< /note >}}
|
||||
|
||||
When a Job is created, the Job controller will immediately begin creating Pods
|
||||
to satisfy the Job's requirements and will continue to do so until the Job is
|
||||
complete. However, you may want to temporarily suspend a Job's execution and
|
||||
resume it later. To suspend a Job, you can update the `.spec.suspend` field of
|
||||
the Job to true; later, when you want to resume it again, update it to false.
|
||||
Creating a Job with `.spec.suspend` set to true will create it in the suspended
|
||||
state.
|
||||
|
||||
When a Job is resumed from suspension, its `.status.startTime` field will be
|
||||
reset to the current time. This means that the `.spec.activeDeadlineSeconds`
|
||||
timer will be stopped and reset when a Job is suspended and resumed.
|
||||
|
||||
Remember that suspending a Job will delete all active Pods. When the Job is
|
||||
suspended, your [Pods will be terminated](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)
|
||||
with a SIGTERM signal. The Pod's graceful termination period will be honored and
|
||||
your Pod must handle this signal in this period. This may involve saving
|
||||
progress for later or undoing changes. Pods terminated this way will not count
|
||||
towards the Job's `completions` count.
|
||||
|
||||
An example Job definition in the suspended state can be like so:
|
||||
|
||||
```shell
|
||||
kubectl get job myjob -o yaml
|
||||
```
|
||||
|
||||
```yaml
|
||||
apiVersion: batch/v1
|
||||
kind: Job
|
||||
metadata:
|
||||
name: myjob
|
||||
spec:
|
||||
suspend: true
|
||||
parallelism: 1
|
||||
completions: 5
|
||||
template:
|
||||
spec:
|
||||
...
|
||||
```
|
||||
|
||||
The Job's status can be used to determine if a Job is suspended or has been
|
||||
suspended in the past:
|
||||
|
||||
```shell
|
||||
kubectl get jobs/myjob -o yaml
|
||||
```
|
||||
|
||||
```json
|
||||
apiVersion: batch/v1
|
||||
kind: Job
|
||||
# .metadata and .spec omitted
|
||||
status:
|
||||
conditions:
|
||||
- lastProbeTime: "2021-02-05T13:14:33Z"
|
||||
lastTransitionTime: "2021-02-05T13:14:33Z"
|
||||
status: "True"
|
||||
type: Suspended
|
||||
startTime: "2021-02-05T13:13:48Z"
|
||||
```
|
||||
|
||||
The Job condition of type "Suspended" with status "True" means the Job is
|
||||
suspended; the `lastTransitionTime` field can be used to determine how long the
|
||||
Job has been suspended for. If the status of that condition is "False", then the
|
||||
Job was previously suspended and is now running. If such a condition does not
|
||||
exist in the Job's status, the Job has never been stopped.
|
||||
|
||||
Events are also created when the Job is suspended and resumed:
|
||||
|
||||
```shell
|
||||
kubectl describe jobs/myjob
|
||||
```
|
||||
|
||||
```
|
||||
Name: myjob
|
||||
...
|
||||
Events:
|
||||
Type Reason Age From Message
|
||||
---- ------ ---- ---- -------
|
||||
Normal SuccessfulCreate 12m job-controller Created pod: myjob-hlrpl
|
||||
Normal SuccessfulDelete 11m job-controller Deleted pod: myjob-hlrpl
|
||||
Normal Suspended 11m job-controller Job suspended
|
||||
Normal SuccessfulCreate 3s job-controller Created pod: myjob-jvb44
|
||||
Normal Resumed 3s job-controller Job resumed
|
||||
```
|
||||
|
||||
The last four events, particularly the "Suspended" and "Resumed" events, are
|
||||
directly a result of toggling the `.spec.suspend` field. In the time between
|
||||
these two events, we see that no Pods were created, but Pod creation restarted
|
||||
as soon as the Job was resumed.
|
||||
|
||||
### Specifying your own Pod selector
|
||||
|
||||
Normally, when you create a Job object, you do not specify `.spec.selector`.
|
||||
|
|
|
@ -170,6 +170,7 @@ different Kubernetes components.
|
|||
| `StorageVersionAPI` | `false` | Alpha | 1.20 | |
|
||||
| `StorageVersionHash` | `false` | Alpha | 1.14 | 1.14 |
|
||||
| `StorageVersionHash` | `true` | Beta | 1.15 | |
|
||||
| `SuspendJob` | `false` | Alpha | 1.21 | |
|
||||
| `Sysctls` | `true` | Beta | 1.11 | |
|
||||
| `TTLAfterFinished` | `false` | Alpha | 1.12 | |
|
||||
| `TopologyManager` | `false` | Alpha | 1.16 | 1.17 |
|
||||
|
@ -775,6 +776,9 @@ Each feature gate is designed for enabling/disabling a specific feature:
|
|||
options can be specified to ensure that the specified number of process IDs
|
||||
will be reserved for the system as a whole and for Kubernetes system daemons
|
||||
respectively.
|
||||
- `SuspendJob`: Enable support to suspend and resume Jobs. See
|
||||
[the Jobs docs](/docs/concepts/workloads/controllers/job/) for
|
||||
more details.
|
||||
- `Sysctls`: Enable support for namespaced kernel parameters (sysctls) that can be
|
||||
set for each pod. See
|
||||
[sysctls](/docs/tasks/administer-cluster/sysctl-cluster/) for more details.
|
||||
|
|
Loading…
Reference in New Issue