add docs for descheduler

Signed-off-by: Garrybest <garrybest@foxmail.com>
2022-03-13 21:41:42 +08:00 · 2022-03-13 21:41:42 +08:00 · d7fa289966
parent a9ac97c016
commit d7fa289966
1 changed files with 143 additions and 0 deletions
--- a/docs/descheduler.md
+++ b/docs/descheduler.md
@ -0,0 +1,143 @@
+# Descheduler
+
+Users could divide their replicas of a workload into different clusters in terms of available resources of member clusters.
+However, the scheduler's decisions are influenced by its view of Karmada at that point of time when a new `ResourceBinding` 
+appears for scheduling. As Karmada multi-clusters are very dynamic and their state changes over time, there may be desire 
+to move already running replicas to some other clusters due to lack of resources for the cluster. This may happen when 
+some nodes of a cluster failed and the cluster does not have enough resource to accommodate their pods or the estimators 
+have some estimation deviation, which is inevitable.
+
+The karmada-descheduler will detect all deployments once in a while, default 2 minutes. In every period, it will find out 
+how many unschedulable replicas a deployment has in target scheduled clusters by calling karmada-scheduler-estimator. Then 
+it will evict them from decreasing `spec.clusters` and trigger karmada-scheduler to do a 'Scale Schedule' based on the current 
+situation. Note that it will take effect only when the replica scheduling strategy is dynamic divided.
+
+## Prerequisites
+
+### Karmada has been installed
+
+We can install Karmada by referring to [quick-start](https://github.com/karmada-io/karmada#quick-start), or directly run `hack/local-up-karmada.sh` script which is also used to run our E2E cases.
+
+### Member cluster component is ready
+
+Ensure that all member clusters has been joined and their corresponding karmada-scheduler-estimator is installed into karmada-host.
+
+You could check by using the following command:
+
+```bash
+# check whether the member cluster has been joined
+$ kubectl get cluster
+NAME       VERSION   MODE   READY   AGE
+member1    v1.19.1   Push   True    11m
+member2    v1.19.1   Push   True    11m
+member3    v1.19.1   Pull   True    5m12s
+
+# check whether the karmada-scheduler-estimator of a member cluster has been working well
+$ kubectl --context karmada-host get pod -n karmada-system | grep estimator
+karmada-scheduler-estimator-member1-696b54fd56-xt789   1/1     Running   0          77s
+karmada-scheduler-estimator-member2-774fb84c5d-md4wt   1/1     Running   0          75s
+karmada-scheduler-estimator-member3-5c7d87f4b4-76gv9   1/1     Running   0          72s
+```
+
+- If the cluster has not been joined, you could use `hack/deploy-agent-and-estimator.sh` to deploy both karmada-agent and karmada-scheduler-estimator.
+- If the cluster has been joined already, you could use `hack/deploy-scheduler-estimator.sh` to only deploy karmada-scheduler-estimator.
+
+### Scheduler option '--enable-scheduler-estimator'
+
+After all member clusters has been joined and estimators are all ready, please specify the option `--enable-scheduler-estimator=true` to enable scheduler estimator.
+
+```bash
+# edit the deployment of karmada-scheduler
+$ kubectl --context karmada-host edit -n karmada-system deployments.apps karmada-scheduler
+```
+
+And then add the option `--enable-scheduler-estimator=true` into the command of container `karmada-scheduler`.
+
+### Descheduler has been installed
+
+Ensure that the karmada-descheduler has been installed in to karmada-host.
+
+```bash
+$ kubectl --context karmada-host get pod -n karmada-system | grep karmada-descheduler
+karmada-descheduler-658648d5b-c22qf                    1/1     Running   0          80s
+```
+
+## Example
+
+Now let's build a scene where some replicas in a member cluster are not capable to be scheduled due to lack of resources.
+
+First we create a deployment with 3 replicas and divide them into 3 member clusters.
+
+```yaml
+apiVersion: policy.karmada.io/v1alpha1
+kind: PropagationPolicy
+metadata:
+  name: nginx-propagation
+spec:
+  resourceSelectors:
+    - apiVersion: apps/v1
+      kind: Deployment
+      name: nginx
+  placement:
+    clusterAffinity:
+      clusterNames:
+        - member1
+        - member2
+        - member3
+    replicaScheduling:
+      replicaDivisionPreference: Weighted
+      replicaSchedulingType: Divided
+      weightPreference:
+        dynamicWeight: AvailableReplicas
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: nginx
+  labels:
+    app: nginx
+spec:
+  replicas: 3
+  selector:
+    matchLabels:
+      app: nginx
+  template:
+    metadata:
+      labels:
+        app: nginx
+    spec:
+      containers:
+      - image: nginx
+        name: nginx
+        resources:
+          requests:
+            cpu: "2"
+```
+
+It is possible for these 3 replicas to be divided into 3 member clusters averagely, i.e. 1 replica in each cluster.
+Now we taint all nodes in member1 and evict the replica.
+
+```bash
+$ kubectl --context member1 cordon member1-control-plane
+$ kubectl --context member1 delete pod nginx-68b895fcbd-jgwz6
+```
+
+A new pod will be created and cannot be scheduled by `kube-scheduler` due to lack of resources.
+
+```bash
+$ kubectl --context member1 get pod
+NAME                     READY   STATUS    RESTARTS   AGE
+nginx-68b895fcbd-fccg4   1/1     Pending   0          80s
+```
+
+After about 5 to 7 minutes, the pod in member1 will be evicted and scheduled to other available clusters.
+
+```bash
+$ kubectl --context member1 get pod
+No resources found in default namespace.
+# kubectl --context member2 get pod
+NAME                     READY   STATUS    RESTARTS   AGE
+nginx-68b895fcbd-dgd4x   1/1     Running   0          6m3s
+nginx-68b895fcbd-nwgjn   1/1     Running   0          4s
+```
+