Merge pull request #26466 from adtac/suspend-1.21
job.md: add section on suspended jobs
This commit is contained in:
commit
6adc893ffa
|
@ -16,7 +16,8 @@ weight: 50
|
||||||
A Job creates one or more Pods and will continue to retry execution of the Pods until a specified number of them successfully terminate.
|
A Job creates one or more Pods and will continue to retry execution of the Pods until a specified number of them successfully terminate.
|
||||||
As pods successfully complete, the Job tracks the successful completions. When a specified number
|
As pods successfully complete, the Job tracks the successful completions. When a specified number
|
||||||
of successful completions is reached, the task (ie, Job) is complete. Deleting a Job will clean up
|
of successful completions is reached, the task (ie, Job) is complete. Deleting a Job will clean up
|
||||||
the Pods it created.
|
the Pods it created. Suspending a Job will delete its active Pods until the Job
|
||||||
|
is resumed again.
|
||||||
|
|
||||||
A simple case is to create one Job object in order to reliably run one Pod to completion.
|
A simple case is to create one Job object in order to reliably run one Pod to completion.
|
||||||
The Job object will start a new Pod if the first Pod fails or is deleted (for example
|
The Job object will start a new Pod if the first Pod fails or is deleted (for example
|
||||||
|
@ -404,6 +405,107 @@ Here, `W` is the number of work items.
|
||||||
|
|
||||||
## Advanced usage
|
## Advanced usage
|
||||||
|
|
||||||
|
### Suspending a Job
|
||||||
|
|
||||||
|
{{< feature-state for_k8s_version="v1.21" state="alpha" >}}
|
||||||
|
|
||||||
|
{{< note >}}
|
||||||
|
Suspending Jobs is available in Kubernetes versions 1.21 and above. You must
|
||||||
|
enable the `SuspendJob` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||||
|
on the [API server](docs/reference/command-line-tools-reference/kube-apiserver/)
|
||||||
|
and the [controller manager](/docs/reference/command-line-tools-reference/kube-controller-manager/)
|
||||||
|
in order to use this feature.
|
||||||
|
{{< /note >}}
|
||||||
|
|
||||||
|
When a Job is created, the Job controller will immediately begin creating Pods
|
||||||
|
to satisfy the Job's requirements and will continue to do so until the Job is
|
||||||
|
complete. However, you may want to temporarily suspend a Job's execution and
|
||||||
|
resume it later. To suspend a Job, you can update the `.spec.suspend` field of
|
||||||
|
the Job to true; later, when you want to resume it again, update it to false.
|
||||||
|
Creating a Job with `.spec.suspend` set to true will create it in the suspended
|
||||||
|
state.
|
||||||
|
|
||||||
|
When a Job is resumed from suspension, its `.status.startTime` field will be
|
||||||
|
reset to the current time. This means that the `.spec.activeDeadlineSeconds`
|
||||||
|
timer will be stopped and reset when a Job is suspended and resumed.
|
||||||
|
|
||||||
|
Remember that suspending a Job will delete all active Pods. When the Job is
|
||||||
|
suspended, your [Pods will be terminated](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination)
|
||||||
|
with a SIGTERM signal. The Pod's graceful termination period will be honored and
|
||||||
|
your Pod must handle this signal in this period. This may involve saving
|
||||||
|
progress for later or undoing changes. Pods terminated this way will not count
|
||||||
|
towards the Job's `completions` count.
|
||||||
|
|
||||||
|
An example Job definition in the suspended state can be like so:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
kubectl get job myjob -o yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: batch/v1
|
||||||
|
kind: Job
|
||||||
|
metadata:
|
||||||
|
name: myjob
|
||||||
|
spec:
|
||||||
|
suspend: true
|
||||||
|
parallelism: 1
|
||||||
|
completions: 5
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
The Job's status can be used to determine if a Job is suspended or has been
|
||||||
|
suspended in the past:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
kubectl get jobs/myjob -o yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
```json
|
||||||
|
apiVersion: batch/v1
|
||||||
|
kind: Job
|
||||||
|
# .metadata and .spec omitted
|
||||||
|
status:
|
||||||
|
conditions:
|
||||||
|
- lastProbeTime: "2021-02-05T13:14:33Z"
|
||||||
|
lastTransitionTime: "2021-02-05T13:14:33Z"
|
||||||
|
status: "True"
|
||||||
|
type: Suspended
|
||||||
|
startTime: "2021-02-05T13:13:48Z"
|
||||||
|
```
|
||||||
|
|
||||||
|
The Job condition of type "Suspended" with status "True" means the Job is
|
||||||
|
suspended; the `lastTransitionTime` field can be used to determine how long the
|
||||||
|
Job has been suspended for. If the status of that condition is "False", then the
|
||||||
|
Job was previously suspended and is now running. If such a condition does not
|
||||||
|
exist in the Job's status, the Job has never been stopped.
|
||||||
|
|
||||||
|
Events are also created when the Job is suspended and resumed:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
kubectl describe jobs/myjob
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
Name: myjob
|
||||||
|
...
|
||||||
|
Events:
|
||||||
|
Type Reason Age From Message
|
||||||
|
---- ------ ---- ---- -------
|
||||||
|
Normal SuccessfulCreate 12m job-controller Created pod: myjob-hlrpl
|
||||||
|
Normal SuccessfulDelete 11m job-controller Deleted pod: myjob-hlrpl
|
||||||
|
Normal Suspended 11m job-controller Job suspended
|
||||||
|
Normal SuccessfulCreate 3s job-controller Created pod: myjob-jvb44
|
||||||
|
Normal Resumed 3s job-controller Job resumed
|
||||||
|
```
|
||||||
|
|
||||||
|
The last four events, particularly the "Suspended" and "Resumed" events, are
|
||||||
|
directly a result of toggling the `.spec.suspend` field. In the time between
|
||||||
|
these two events, we see that no Pods were created, but Pod creation restarted
|
||||||
|
as soon as the Job was resumed.
|
||||||
|
|
||||||
### Specifying your own Pod selector
|
### Specifying your own Pod selector
|
||||||
|
|
||||||
Normally, when you create a Job object, you do not specify `.spec.selector`.
|
Normally, when you create a Job object, you do not specify `.spec.selector`.
|
||||||
|
|
|
@ -170,6 +170,7 @@ different Kubernetes components.
|
||||||
| `StorageVersionAPI` | `false` | Alpha | 1.20 | |
|
| `StorageVersionAPI` | `false` | Alpha | 1.20 | |
|
||||||
| `StorageVersionHash` | `false` | Alpha | 1.14 | 1.14 |
|
| `StorageVersionHash` | `false` | Alpha | 1.14 | 1.14 |
|
||||||
| `StorageVersionHash` | `true` | Beta | 1.15 | |
|
| `StorageVersionHash` | `true` | Beta | 1.15 | |
|
||||||
|
| `SuspendJob` | `false` | Alpha | 1.21 | |
|
||||||
| `Sysctls` | `true` | Beta | 1.11 | |
|
| `Sysctls` | `true` | Beta | 1.11 | |
|
||||||
| `TTLAfterFinished` | `false` | Alpha | 1.12 | |
|
| `TTLAfterFinished` | `false` | Alpha | 1.12 | |
|
||||||
| `TopologyManager` | `false` | Alpha | 1.16 | 1.17 |
|
| `TopologyManager` | `false` | Alpha | 1.16 | 1.17 |
|
||||||
|
@ -775,6 +776,9 @@ Each feature gate is designed for enabling/disabling a specific feature:
|
||||||
options can be specified to ensure that the specified number of process IDs
|
options can be specified to ensure that the specified number of process IDs
|
||||||
will be reserved for the system as a whole and for Kubernetes system daemons
|
will be reserved for the system as a whole and for Kubernetes system daemons
|
||||||
respectively.
|
respectively.
|
||||||
|
- `SuspendJob`: Enable support to suspend and resume Jobs. See
|
||||||
|
[the Jobs docs](/docs/concepts/workloads/controllers/job/) for
|
||||||
|
more details.
|
||||||
- `Sysctls`: Enable support for namespaced kernel parameters (sysctls) that can be
|
- `Sysctls`: Enable support for namespaced kernel parameters (sysctls) that can be
|
||||||
set for each pod. See
|
set for each pod. See
|
||||||
[sysctls](/docs/tasks/administer-cluster/sysctl-cluster/) for more details.
|
[sysctls](/docs/tasks/administer-cluster/sysctl-cluster/) for more details.
|
||||||
|
|
Loading…
Reference in New Issue