Merge pull request #5674 from ingvagabund/sig-scheduling-code-architecture
sig-scheduling: Scheduler code hierarchy overview
This commit is contained in:
		
						commit
						fce5c5dd57
					
				
										
											Binary file not shown.
										
									
								
							| After Width: | Height: | Size: 28 KiB | 
|  | @ -0,0 +1,286 @@ | ||||||
|  | # Scheduler code hierarchy overview | ||||||
|  | 
 | ||||||
|  | ## Introduction | ||||||
|  | 
 | ||||||
|  | The scheduler watches for newly created Pods that have no Node assigned. | ||||||
|  | For every Pod that the scheduler discovers, the scheduler becomes responsible | ||||||
|  | for finding the best Node for that Pod to run on. | ||||||
|  | Scheduling in general is quite an extensive field in computer science which takes | ||||||
|  | into account various range of constraints and limitations. | ||||||
|  | Each workload may require a different approach to achieve optimal scheduling results. | ||||||
|  | The kube-scheduler provided by Kubernetes project was constructed with a goal | ||||||
|  | to provide high throughput at the cost of being simple. | ||||||
|  | To help in building a scheduler (the default or a custom one) and to share | ||||||
|  | elements of the scheduling logic, | ||||||
|  | [the scheduling framework](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/) | ||||||
|  | was implemented. | ||||||
|  | The framework does not provide all pieces to build a new scheduler from scratch. | ||||||
|  | Queues, caches, scheduling algorithms and other building elements are still needed to assemble | ||||||
|  | a fully functional unit. This document aims at describing how all the individual | ||||||
|  | pieces are put together and what’s their role in the overall architecture | ||||||
|  | so a developer can quickly orient in the code. | ||||||
|  | 
 | ||||||
|  | ## Scheduling a pod | ||||||
|  | 
 | ||||||
|  | The default scheduler instance has a loop running indefinitely | ||||||
|  | which (everytime there’s a pod) is responsible for invoking the scheduling logic | ||||||
|  | and making sure a pod gets either a node assigned or requeued for future processing. | ||||||
|  | Each loop consists of a blocking scheduling and a non-blocking binding cycle. | ||||||
|  | The scheduling cycle is responsible for running the scheduling algorithm selecting | ||||||
|  | the most suitable node for placing the pod. | ||||||
|  | The binding cycle makes sure the kube-apiserver is made aware of the selected | ||||||
|  | node at the right time. A pod may be bound immediately, or in the case of gang scheduling, | ||||||
|  | wait until all its sibling pods have their node assigned. | ||||||
|  | 
 | ||||||
|  | ### Scheduling cycle | ||||||
|  | 
 | ||||||
|  | Each cycle honors the following steps: | ||||||
|  | 1. Get the next pod for scheduling | ||||||
|  | 1. Schedule a pod with provided algorithm | ||||||
|  | 1. If a pod fails to be scheduled due to `FitError`, run preemption plugin in | ||||||
|  |    `PostFilterPlugin` (if the plugin is registered) to nominate a node where | ||||||
|  |    the pods can run. If preemption was successful, | ||||||
|  |    let the current pod be aware of the nominated node. | ||||||
|  |    Handle the error, get the next pod and start over. | ||||||
|  | 1. If the scheduling algorithm finds a suitable node, store the pod into | ||||||
|  |    the scheduler cache (`AssumePod` operation) and run plugins from the `Reserve` | ||||||
|  |    and `Permit` extension point in that order. In case any of the plugins fails, | ||||||
|  |    end the current scheduling cycle, increase relevant metrics and handle | ||||||
|  |    the scheduling error through the `Error` handler. | ||||||
|  | 1. Upon successfully running all extension points, proceed to the binding cycle. | ||||||
|  |    At the same time start processing another pod (if there’s any). | ||||||
|  | 
 | ||||||
|  | ### Binding cycle | ||||||
|  | 
 | ||||||
|  | Consists of the following four steps ran in the same order: | ||||||
|  | - Invoking [WaitOnPermit](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/scheduler.go#L560) | ||||||
|  |   (internal API) of plugins from `Permit` extension point. Some plugins from the extension point | ||||||
|  |   may send a request for an operation requiring to wait for a condition | ||||||
|  |   (e.g. wait for additional resources to be available or wait for all pods | ||||||
|  |   in a gang to be assumed). | ||||||
|  |   Under the hood, `WaitOnPermit` waits for such a condition to be met within a timeout threshold. | ||||||
|  | - Invoking plugins from [PreBind](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/scheduler.go#L580) extension point | ||||||
|  | - Invoking plugins from [Bind](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/scheduler.go#L592) extension point | ||||||
|  | - Invoking plugins from [PostBind](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/scheduler.go#L611) extension point | ||||||
|  | 
 | ||||||
|  | In case of processing of any of the extension points fails, `Unreserve` operation | ||||||
|  | of all `Reserve` plugins is invoked (e.g. free resources allocated for a gang of pods). | ||||||
|  | 
 | ||||||
|  | ## Configuring and assembling the scheduler | ||||||
|  | 
 | ||||||
|  | The scheduler codebase spans across various locations. Last but not least to mention: | ||||||
|  | - [cmd/kube-scheduler/app](https://github.com/kubernetes/kubernetes/tree/a651804427dd9a15bb91e1c4fb7a79994e4817a2/cmd/kube-scheduler/app): | ||||||
|  |   location of the controller code alongside definition of CLI arguments (honors the standard setup for all Kubernetes controllers) | ||||||
|  | - [pkg/scheduler](https://github.com/kubernetes/kubernetes/tree/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler): | ||||||
|  |   the default scheduler codebase root directory | ||||||
|  | - [pkg/scheduler/core](https://github.com/kubernetes/kubernetes/tree/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/core): | ||||||
|  |   location of the default scheduling algorithm | ||||||
|  | - [pkg/scheduler/framework](https://github.com/kubernetes/kubernetes/tree/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/framework): | ||||||
|  |   scheduling framework alongside plugins | ||||||
|  | - [pkg/scheduler/internal](https://github.com/kubernetes/kubernetes/tree/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/internal): | ||||||
|  |   implementation of the cache, queues and other internal elements | ||||||
|  | - [staging/src/k8s.io/kube-scheduler](https://github.com/kubernetes/kubernetes/tree/a651804427dd9a15bb91e1c4fb7a79994e4817a2/staging/src/k8s.io/kube-scheduler): | ||||||
|  |   location of ComponentConfig API types | ||||||
|  | - [test/e2e/scheduling](https://github.com/kubernetes/kubernetes/tree/a651804427dd9a15bb91e1c4fb7a79994e4817a2/test/e2e/scheduling): | ||||||
|  |   scheduling e2e | ||||||
|  | - [test/integration/scheduler](https://github.com/kubernetes/kubernetes/tree/a651804427dd9a15bb91e1c4fb7a79994e4817a2/test/integration/scheduler) | ||||||
|  |   scheduling integration tests | ||||||
|  | - [test/integration/scheduler_perf](https://github.com/kubernetes/kubernetes/tree/a651804427dd9a15bb91e1c4fb7a79994e4817a2/test/integration/scheduler_perf) | ||||||
|  |   scheduling performance benchmarks | ||||||
|  | 
 | ||||||
|  | ### Initial startup configuration | ||||||
|  | 
 | ||||||
|  | Code under `cmd/kube-scheduler/app` is responsible for collecting scheduler | ||||||
|  | configuration and initializing logic allowing the kube-scheduler to run | ||||||
|  | as part of the Kubernetes control plane. The code includes: | ||||||
|  | - Initializing [command line options](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/cmd/kube-scheduler/app/server.go#L96) | ||||||
|  |   (along with a default `ComponentConfig`) and [validation](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/cmd/kube-scheduler/app/server.go#L300) | ||||||
|  | - Initializing [metrics](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/cmd/kube-scheduler/app/server.go#L238) | ||||||
|  |   (`/metrics`), [health check](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/cmd/kube-scheduler/app/server.go#L268) | ||||||
|  |   (`/healthz`) and [other handlers](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/cmd/kube-scheduler/app/server.go#L225-L236) | ||||||
|  |   (authorization, authentication, panic recovery, etc.) | ||||||
|  | - Reading and defaulting configuration of [KubeSchedulerConfiguration](https://github.com/kubernetes/kubernetes/blob/4740173f3378ef9d0dc59b0aa9299444a97d0818/pkg/scheduler/apis/config/types.go#L49-L106) | ||||||
|  | - Building a registry with plugins (in-tree, [out-of-tree](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/cmd/kube-scheduler/app/server.go#L312-L317)) | ||||||
|  | - Initializing the scheduler with various options such as [profiles, algorithm source, pod back off, etc.](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/cmd/kube-scheduler/app/server.go#L326-L337) | ||||||
|  | - Invocation of [LogOrWriteConfig](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/cmd/kube-scheduler/app/server.go#L342) which logs the final scheduler configuration for debugging purposes | ||||||
|  | - Right before running, `/configz` [is registered](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/cmd/kube-scheduler/app/server.go#L141), | ||||||
|  |   [events broadcaster started](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/cmd/kube-scheduler/app/server.go#L148), | ||||||
|  |   [leader election initiated](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/cmd/kube-scheduler/app/server.go#L198-L216), | ||||||
|  |    and [the server](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/cmd/kube-scheduler/app/server.go#L185) | ||||||
|  |    with all the configured handlers and [informers](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/cmd/kube-scheduler/app/server.go#L192) | ||||||
|  |    is started. | ||||||
|  | 
 | ||||||
|  | Once initialized, the scheduler can run. | ||||||
|  | 
 | ||||||
|  | In more detail, there’s a [Setup](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/cmd/kube-scheduler/app/server.go#L299) | ||||||
|  | function accomplishing what is essentially | ||||||
|  | the initialization of the scheduler’s core process. | ||||||
|  | First, it validates the options that have been passed through (the flags added | ||||||
|  | in `NewSchedulerCommand()` are set directly on this options struct’s fields). | ||||||
|  | If the options passed so far don’t raise any errors, it then calls `opts.Config()` | ||||||
|  | which sets up the final internal settings including secure serving, leader election, | ||||||
|  | clients, and begins parsing options related to the algorithm source | ||||||
|  | (like loading config files and initializing empty profiles as well as handling | ||||||
|  | deprecated options like policy config). The next lines call `c.Complete()` to complete | ||||||
|  | the config by filling in any empty values. At this point any out-of-tree plugins | ||||||
|  | are registered by creating a blank registry and adding entries in that registry | ||||||
|  | for each plugin’s New function. It should be noted that the Registry is simply | ||||||
|  | a map of plugin names to their factory functions. For the default scheduler, | ||||||
|  | this step does nothing (because our main function in `cmd/kube-scheduler/scheduler.go` | ||||||
|  | passes nothing to `NewSchedulerCommand()`). | ||||||
|  | This means the default set of plugins is initialized in `scheduler.New()`. | ||||||
|  | 
 | ||||||
|  | Given the initialization is performed outside the scheduling framework, | ||||||
|  | different consumers of the framework can initialize the environment differently | ||||||
|  | to cover their needs. For example, a simulator can inject its own object | ||||||
|  | through informers. Or custom plugins may be provided instead of the default ones. | ||||||
|  | Known consumers of the scheduling framework: | ||||||
|  | - [cluster-autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/simulator/scheduler_based_predicates_checker.go#L48-L79) | ||||||
|  | - [cluster-capacity](https://github.com/kubernetes-sigs/cluster-capacity/blob/8e9c2dcf3644cb5f73fca3d35d4e22899c265ad5/pkg/framework/simulator.go#L370-L383) | ||||||
|  | 
 | ||||||
|  | ### Assembling the scheduler | ||||||
|  | 
 | ||||||
|  | The code is located under `pkg/scheduler`. | ||||||
|  | This is where implementation of the default scheduler lives. | ||||||
|  | Various elements of the scheduler are initialized and put together here: | ||||||
|  | - Default scheduling options such as node percentage, initial and maximum backoff, profiles | ||||||
|  | - Scheduler cache and queues | ||||||
|  | - Scheduling profiles instantiated to tailor a framework for each profile | ||||||
|  |   to better suit pod placement (each profile defines a set of plugins to use) | ||||||
|  | - Handler functions for getting the next pod for scheduling (`NextPod`) and error handling (`Error`) | ||||||
|  | 
 | ||||||
|  | The following steps are taken during the process of creating a scheduler instance: | ||||||
|  | - Scheduler [cache is initialized](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/scheduler.go#L206) | ||||||
|  | - Both in-tree and out-of-tree registries with plugins are [merged together](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/scheduler.go#L208-L211) | ||||||
|  | - Metrics are [registered](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/scheduler.go#L232) | ||||||
|  | - [Configurator](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/scheduler.go#L215-L230) | ||||||
|  |   building a scheduler instance (wiring the cache, plugin registry, | ||||||
|  |   scheduling algorithm and other elements together) | ||||||
|  | - Event handlers [are registered](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/scheduler.go#L273) | ||||||
|  |   to allow the scheduler to react on changes in PVs, | ||||||
|  |   PVCs, services and other objects relevant for scheduling (eventually, | ||||||
|  |   each plugin will define a set of events on which it reacts, | ||||||
|  |   see [kubernetes/kubernetes#100347](https://github.com/kubernetes/kubernetes/issues/100347) | ||||||
|  |   for more details). | ||||||
|  | 
 | ||||||
|  | The following diagram shows how individual elements are connected together | ||||||
|  | once initialized. Event handlers make sure pods are properly enqueued | ||||||
|  | in the [scheduling queues](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-scheduling/scheduler_queues.md), | ||||||
|  | the cache is updated with pods and nodes | ||||||
|  | as they go (to provide up-to-date snapshot). Scheduling algorithm and the binding cycle | ||||||
|  | have the right instances of the framework available (one instance of the framework per a profile). | ||||||
|  | 
 | ||||||
|  |  | ||||||
|  | 
 | ||||||
|  | #### Scheduling framework | ||||||
|  | 
 | ||||||
|  | Its code is currently located under `pkg/scheduler/framework`. | ||||||
|  | It contains [various plugins](https://github.com/kubernetes/kubernetes/tree/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/framework/plugins) | ||||||
|  | responsible for filtering and scoring nodes (among others). | ||||||
|  | Used as building blocks for any scheduling algorithm. | ||||||
|  | 
 | ||||||
|  | When a [plugin is initialized](https://github.com/kubernetes/kubernetes/blob/4740173f3378ef9d0dc59b0aa9299444a97d0818/pkg/scheduler/framework/runtime/framework.go#L310), | ||||||
|  | it’s passed a [framework handler](https://github.com/kubernetes/kubernetes/blob/4740173f3378ef9d0dc59b0aa9299444a97d0818/pkg/scheduler/framework/runtime/framework.go#L251-L264) | ||||||
|  | which provides interfaces to access and/or manipulate pods, nodes, clientset, | ||||||
|  | event recorder and other handlers every plugin needs to implement its functionality. | ||||||
|  | 
 | ||||||
|  | #### Scheduler cache | ||||||
|  | 
 | ||||||
|  | Cache is responsible for capturing the current state of a cluster. | ||||||
|  | Keeping a list of nodes and assumed pods alongside states of pods and images. | ||||||
|  | The cache provides methods for reconciling pod and node objects | ||||||
|  | (invoked through event handlers) keeping the state of the cluster up to date. | ||||||
|  | Allowing to update the snapshot of a cluster (to pin the cluster state while a scheduling | ||||||
|  | algorithm is run) with the latest state at the beginning of each scheduling cycle. | ||||||
|  | 
 | ||||||
|  | The cache also allows to run assume operation which temporarily stores a pod | ||||||
|  | in the cache and makes it look as the pod is actually already | ||||||
|  | running on a designated node for all consumers of the snapshot. | ||||||
|  | Assume operation exists to remove the time the pod actually gets updated | ||||||
|  | on the kube-apiserver side and thus increasing the scheduler’s throughput. | ||||||
|  | The following operations manipulate with the assumed pods: | ||||||
|  | - `AssumePod`: to signal the scheduling algorithm found a feasible node so the next | ||||||
|  |   pod can be attempted while the current pod enters the binding cycle | ||||||
|  | - `FinishBinding`: used to signal Bind finished so the pod can be removed | ||||||
|  |   from the list of assumed pods | ||||||
|  | - `ForgetPod`: removes pod from the list of assumed pods, used in case the pod | ||||||
|  |   fails to get processed in the binding cycle successfully | ||||||
|  |   (e.g. during `Reserve`, `Permit`, `PreBind` or `Bind` evaluation) | ||||||
|  | 
 | ||||||
|  | The cache keeps track of the following three metrics: | ||||||
|  | - `scheduler_cache_size_assumed_pods`: number of pods in the assume pods list | ||||||
|  | - `scheduler_cache_size_pods`: number of pods in the cache | ||||||
|  | - `scheduler_cache_size_nodes`: number of nodes in the cache | ||||||
|  | 
 | ||||||
|  | #### Snapshot | ||||||
|  | 
 | ||||||
|  | The [snapshot](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/internal/cache/snapshot.go) | ||||||
|  | captures the state of a cluster carrying information about all nodes | ||||||
|  | in a cluster and objects located on each node. | ||||||
|  | Namely node objects, pods assigned on each node, requested resources of all pods | ||||||
|  | on each node, node’s allocatable, images pulled and other information needed | ||||||
|  | to make a scheduling decision. Every time a pod is scheduled, | ||||||
|  | a snapshot of the current state of the cluster is captured. | ||||||
|  | To avoid a case where a pod or node gets changed while plugins are processed | ||||||
|  | which might lead to data inconsistency as some plugins might get a different | ||||||
|  | view of the cluster. | ||||||
|  | 
 | ||||||
|  | #### Configurator | ||||||
|  | 
 | ||||||
|  | A [configurator](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/factory.go#L90) | ||||||
|  | builds the scheduler instance by wiring plugins, cache, queues, | ||||||
|  | handlers and other elements together. Each profile [is initialized](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/factory.go#L138-L147) | ||||||
|  | with its own framework (with all frameworks sharing informers, event recorders, etc.). | ||||||
|  | 
 | ||||||
|  | At this point it’s still possible to have the configurator create the instance | ||||||
|  | [from a policy file](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/factory.go#L213). | ||||||
|  | Though, this approach is deprecated and will be removed | ||||||
|  | from the configuration eventually. Keeping only the kube scheduler configuration | ||||||
|  | as the only way to provide the configuration. | ||||||
|  | 
 | ||||||
|  | #### Default scheduling algorithm | ||||||
|  | 
 | ||||||
|  | The codebase defines a [ScheduleAlgorithm](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/core/generic_scheduler.go#L61-L66) | ||||||
|  | interface. | ||||||
|  | Any implementation of the interface can be used as a scheduling algorithm. | ||||||
|  | There are two methods: | ||||||
|  | - `Schedule`: responsible for scheduling a pod using plugins from `PreFilter` | ||||||
|  |   up to `NormalizeScore` extension points, provides [ScheduleResult](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/core/generic_scheduler.go#L70-L77) | ||||||
|  |   containing a scheduling decision (the most suitable nodes) with additional | ||||||
|  |   accompanying information such as how many nodes were evaluated | ||||||
|  |   and how many nodes were found feasible for scheduling. | ||||||
|  | - `Extenders`: currently exposed only for testing | ||||||
|  | 
 | ||||||
|  | Each cycle of the default algorithm implementation consists of: | ||||||
|  | 1. Taking the [current snapshot](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/core/generic_scheduler.go#L101) | ||||||
|  |    from the scheduling cache | ||||||
|  | 1. [Filter out all nodes](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/core/generic_scheduler.go#L110) | ||||||
|  |    not feasible for scheduling a pod | ||||||
|  |    1. Run [PreFilter plugins](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/core/generic_scheduler.go#L230) | ||||||
|  |       first (preprocessing phase, e.g. computing pod [anti-]affinity relations) | ||||||
|  |    1. Run [Filter plugins](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/core/generic_scheduler.go#L261) in parallel: | ||||||
|  |       filter out all nodes which does not satisfy pod’s constraints | ||||||
|  |       (e.g. sufficient resources, node affinity, etc.), including running filter extenders | ||||||
|  |    1. Run [PostFilter plugins](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/scheduler.go#L479) | ||||||
|  |       if no node can fit the incoming pod | ||||||
|  | 1. In case there are at least two feasible nodes for scheduling, run [scoring plugins](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/core/generic_scheduler.go#L133): | ||||||
|  |    1. Run [PreScore plugins](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/core/generic_scheduler.go#L427) | ||||||
|  |       first (preprocessing phase) | ||||||
|  |    1. Run [Score plugins](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/core/generic_scheduler.go#L433) in parallel: | ||||||
|  |       each node is given a score vector (each coordinate corresponding to one plugin) | ||||||
|  |    1. Run [NormalizeScore plugins](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/framework/runtime/framework.go#L798): | ||||||
|  |       to have all plugins given a score in <0; 100> interval | ||||||
|  |    1. Compute [weighted score](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/framework/runtime/framework.go#L810-L828) | ||||||
|  |       for each node (each score plugin can have | ||||||
|  |       a weight assigned indicating how much its score is preferred over others) | ||||||
|  |    1. Run [score extenders](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/core/generic_scheduler.go#L456) | ||||||
|  |       and add it to the total score of each node | ||||||
|  | 1. [Select](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/core/generic_scheduler.go#L138) | ||||||
|  |    and [give back a node](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/core/generic_scheduler.go#L141-L145) | ||||||
|  |    with the highest score. If there’s only a [single feasible node](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/core/generic_scheduler.go#L125-L131) | ||||||
|  |    skip `PreScore`, `Score` and `NormalizeScore` extension points | ||||||
|  |    and give back the node right away. If there’s no feasible node, report it. | ||||||
|  | 
 | ||||||
|  | Be aware of: | ||||||
|  | - If a plugin provides score normalization, it needs to return non-nil | ||||||
|  | when [ScoreExtensions()](https://github.com/kubernetes/kubernetes/blob/a651804427dd9a15bb91e1c4fb7a79994e4817a2/pkg/scheduler/framework/plugins/podtopologyspread/scoring.go#L254-L256) gets invoked | ||||||
		Loading…
	
		Reference in New Issue