volcano/docs/design/queue/queue.md

4.6 KiB

Queue

@k82cn; April 17, 2019

Motivation

Queue was introduced in kube-batch long time ago as an internal feature, which makes all jobs are submitted to the same queue, named default. As more and more users would like to share resources with each other by queue, this proposal is going to cover primary features of queue achieve that.

Function Specification

The queue is cluster level, so the user from different namespaces can share resource within a Queue. The following section defines the api of queue.

API

type Queue struct {
    metav1.TypeMeta `json:",inline"`

    metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`

    // Specification of the desired behavior of a queue
    // +optional
    Spec QueueSpec `json:"spec,omitempty" protobuf:"bytes,2,opt,name=spec"`

    // Current status of Queue
    // +optional
    Status QueueStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"`
}

type QueueSpec struct {
    // The weight of queue to share the resources with each other.
    Weight int32 `json:"weight,omitempty" protobuf:"bytes,1,opt,name=weight"`
}

type QueueStatus struct {
    // The number of job in Unknown status
    Unknown int32 `json:"running,omitempty" protobuf:"bytes,1,opt,name=running"`
    // The number of job in Running status
    Running int32 `json:"running,omitempty" protobuf:"bytes,2,opt,name=running"`
    // The number of job in Pending status
    Pending int32 `json:"pending,omitempty" protobuf:"bytes,3,opt,name=pending"`
    // The number of job in Completed status
    Completed int32 `json:"completed,omitempty" protobuf:"bytes,4,opt,name=completed"`
    // The number of job in Failed status
    Failed int32 `json:"failed,omitempty" protobuf:"bytes,5,opt,name=failed"`
    // The number of job in Aborted status
    Aborted int32 `json:"aborted,omitempty" protobuf:"bytes,6,opt,name=aborted"`
}

QueueController

The QueueController will manage the lifecycle of queue:

  1. Watching PodGroup/Job for status
  2. If Queue was deleted, also delete all related PodGroup/Job in the queue

Admission Controller

The admission controller will check PodGroup/Job 's queue when creation:

  1. if the queue does not exist, the creation will be rejected
  2. if the queue is releasing, the creation will be also rejected

Feature Interaction

Customized Job/PodGroup

If the PodGroup is created by customized controller, the QueueController will count those PodGroup into Unknown status; because PodGroup focus on scheduling specification which did not include customized job's status.

cli

Command line is also enhanced for operator engineers. Three sub-commands are introduced as follow:

create:

create command is used to create a queue with weight; for example, the following command will create a queue named myqueue with weight 10.

$ vcctl queue create --name myqueue --weight 10

view:

view command is used to show the detail of a queue, e.g. creation time; the following command will show the detail of queue myqueue

$ vcctl queue view myqueue

list:

list command is used to show all available queues to current user

$ vcctl queue list
Name      Weight  Total  Pending  Running ...
myqueue   10      10     5        5

Scheduler

  • Proportion plugin:

    Proportion plugin is used to share resource between Queues by weight. The deserved resource of a queue is (weight/total-weight) * total-resource. When allocating resources, it will not allocate resource more than its deserved resources.

  • Reclaim action:

    reclaim action will go through all queues to reclaim others by ReclaimableFn's return value; the time complexity is O(n^2). In ReclaimableFn, both proportion and gang will take effect: 1. proportion makes sure the queue will not be under-used after reclaim, 2. gang makes sure the job will not be reclaimed if its minAvailable > 1.

  • Backfill action:

    When allocate action assign resources to each queue, there's a case that (kube-batch#492) the resources maybe unnecessary idle because of proportion plugin: there are one pending job in two queue each, and the deserved resources of each queue can not meet the requirement of their jobs. In such case, backfill action will ignore deserved guarantee of queue to fill idle resources as much as possible. This introduces another potential case that the coming smaller job is blocked; this case will be handle by reserved resources of each queue in other project.