4.6 KiB
Queue
@k82cn; April 17, 2019
Motivation
Queue was introduced in kube-batch long time ago as an internal feature, which makes all jobs are submitted to the same queue, named default. As more and more users would like to share resources with each other by queue, this proposal is going to cover primary features of queue achieve that.
Function Specification
The queue is cluster level, so the user from different namespaces can share resource within a Queue. The following section defines the api of queue.
API
type Queue struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
// Specification of the desired behavior of a queue
// +optional
Spec QueueSpec `json:"spec,omitempty" protobuf:"bytes,2,opt,name=spec"`
// Current status of Queue
// +optional
Status QueueStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"`
}
type QueueSpec struct {
// The weight of queue to share the resources with each other.
Weight int32 `json:"weight,omitempty" protobuf:"bytes,1,opt,name=weight"`
}
type QueueStatus struct {
// The number of job in Unknown status
Unknown int32 `json:"running,omitempty" protobuf:"bytes,1,opt,name=running"`
// The number of job in Running status
Running int32 `json:"running,omitempty" protobuf:"bytes,2,opt,name=running"`
// The number of job in Pending status
Pending int32 `json:"pending,omitempty" protobuf:"bytes,3,opt,name=pending"`
// The number of job in Completed status
Completed int32 `json:"completed,omitempty" protobuf:"bytes,4,opt,name=completed"`
// The number of job in Failed status
Failed int32 `json:"failed,omitempty" protobuf:"bytes,5,opt,name=failed"`
// The number of job in Aborted status
Aborted int32 `json:"aborted,omitempty" protobuf:"bytes,6,opt,name=aborted"`
}
QueueController
The QueueController will manage the lifecycle of queue:
- Watching
PodGroup/Jobfor status - If
Queuewas deleted, also delete all relatedPodGroup/Jobin the queue
Admission Controller
The admission controller will check PodGroup/Job 's queue when creation:
- if the queue does not exist, the creation will be rejected
- if the queue is releasing, the creation will be also rejected
Feature Interaction
Customized Job/PodGroup
If the PodGroup is created by customized controller, the QueueController will count those PodGroup into Unknown status; because PodGroup focus on scheduling specification which did not include customized job's status.
cli
Command line is also enhanced for operator engineers. Three sub-commands are introduced as follow:
create:
create command is used to create a queue with weight; for example, the following command will create a queue named myqueue with weight 10.
$ vcctl queue create --name myqueue --weight 10
view:
view command is used to show the detail of a queue, e.g. creation time; the following command will show the detail of queue myqueue
$ vcctl queue view myqueue
list:
list command is used to show all available queues to current user
$ vcctl queue list
Name Weight Total Pending Running ...
myqueue 10 10 5 5
Scheduler
-
Proportion plugin:
Proportion plugin is used to share resource between
Queues by weight. The deserved resource of a queue is(weight/total-weight) * total-resource. When allocating resources, it will not allocate resource more than its deserved resources. -
Reclaim action:
reclaimaction will go through all queues to reclaim others byReclaimableFn's return value; the time complexity isO(n^2). InReclaimableFn, bothproportionandgangwill take effect: 1.proportionmakes sure the queue will not be under-used after reclaim, 2.gangmakes sure the job will not be reclaimed if itsminAvailable> 1. -
Backfill action:
When
allocateaction assign resources to each queue, there's a case that (kube-batch#492) the resources maybe unnecessary idle because ofproportionplugin: there are one pending job in two queue each, and the deserved resources of each queue can not meet the requirement of their jobs. In such case,backfillaction will ignore deserved guarantee of queue to fill idle resources as much as possible. This introduces another potential case that the coming smaller job is blocked; this case will be handle by reserved resources of each queue in other project.