From 410e52883b6d01d9dfded0bd3bc5272534565d62 Mon Sep 17 00:00:00 2001 From: chaosi-zju Date: Thu, 1 Feb 2024 21:45:18 +0800 Subject: [PATCH] proposal of the LazyActivation preference for Policy Signed-off-by: chaosi-zju --- .../lazy-activation-preference.md | 462 ++++++++++++++++++ 1 file changed, 462 insertions(+) create mode 100644 docs/proposals/scheduling/activation-preference/lazy-activation-preference.md diff --git a/docs/proposals/scheduling/activation-preference/lazy-activation-preference.md b/docs/proposals/scheduling/activation-preference/lazy-activation-preference.md new file mode 100644 index 000000000..4b6dfb4fe --- /dev/null +++ b/docs/proposals/scheduling/activation-preference/lazy-activation-preference.md @@ -0,0 +1,462 @@ +--- +title: Introduce a lazy activation preference to Policy +authors: + - "@chaosi-zju" +reviewers: + - "@RainbowMango" + - "@chaunceyjiang" + - "TBD" +approvers: + - "@RainbowMango" + - "TBD" + +creation-date: 2024-01-30 +--- + +# Introduce a lazy activation preference to Policy + +## Background + +In the scenario where `Policy` responsibilities are separated from `Resource` responsibilities, +one role is dedicated to `Policy` management, which is referred to as cluster administrator, +and the other role is dedicated to `Resource` management, which is referred to as user. +The cluster administrator would preconfigure some `Policies` for users, then the users apply their own `Resources` and +propagated them according to the preconfigured `Policy`. + +However, the administrator has requests to modify the `Policy` in the future, such as migrating propagation clusters. +As the current `Policy` modification is effective immediately, the modification will affect the propagation status of +a large number of resource templates and have a great impact on the system. + +The administrator is worried that the modification will lead to the failure of the business application in the +non-change window, so the administrator hopes that the modification of `Policy` can be delayed until +the business application change window takes effect. + +### Goals + +The most fundamental criterion is: + +* **Policy changes cannot actively cause changes to the propagation status of resource templates.** + +Under this criterion, a new delayed effective mechanism is introduced into `Policy`, +by which the referenced resource template will delay responding to the change of `Policy` until the resource template itself changes. + +> changes of resource template refer to any filed is modified except `status`, `metadata.managedFields`, `metadata.resourceVersion`, +> `metadata.generation`, or the label / annotation whose keys are prefixed with `.karmada.io`. + +### Applicable scenario + +This is an experimental feature that might help in a scenario where: + +* A policy manages huge amount of resource templates, changes to a policy typically affect numerous applications simultaneously. +* Policy responsibilities are separated from resource templates responsibilities. + +A minor misconfiguration could lead to widespread failures. +With this feature, the change can be gradually rolled out through iterative modifications of resource templates. + +## Proposal + +### Overview + +We are introducing a `ActivationPreference` filed to Policy, indicating how the referencing resource template will be propagated, +in case of policy changes. + +If empty, the resource template will respond to policy changes immediately, in other words, +any policy changes will drive the resource template to be propagated immediately as per the current propagation rules. + +If the value is `Lazy` means the policy changes will not take effect for now but defer to the resource template changes, +in other words, the resource template will not be propagated as per the current propagation rules until there is an update on it. + +### User Story + +If you prefer `Lazy` activation preference in Policy, you can write your Policy as follows. + +```yaml +apiVersion: policy.karmada.io/v1alpha1 +kind: PropagationPolicy +metadata: + name: policy-delay +spec: + activationPreference: Lazy # new added field + resourceSelectors: + - apiVersion: apps/v1 + kind: Deployment + placement: + clusterAffinity: + clusterNames: + - member1 + - member2 +``` + +### Scheme Rationality + +1)There is a snapshot of Policy in ResourceBinding that records the fields related to `Work` propagation, +and the scheduler actually decides how to propagate `Work` based on the Policy snapshot in ResourceBinding. +Therefore, `PropagationPolicy` can be understood as a preset policy, and the policy in ResourceBinding can be understood as a real Policy. +When users configure `PropagationPolicy`, they have a reason to decide when the preset configuration will be refreshed to the actual Policy. +It can be refreshed immediately, delayed until the change of resources, or even delayed to a fixed time in the future. +As long as there is a corresponding scenario, it is reasonable. + +2)When `Lazy` activation preference is introduced, the only difference to our system logical is in the `reconcile` process +of resource template, whether to refresh the binding will depend on this field. If the change of resource template is caused +by `Karmada` itself and the current bound Policy is `Lazy`, it will skip refreshing the binding. It is short, clear and make scenes. + +### Notes/Constraints/Caveats + +1)Basing on the most fundamental criterion mentioned in [Goals](#goals), there are two special case needs to be noticed. + +* If you create a policy first, then when you create a matchable resource, the resource will be propagated at once. +* If you create a resource first, then when you create a matchable policy, the previous resource will not be propagated + until the resource be updated again. + +This behavior is in line with the criterion, but it will bring inconvenience to your operation and maintenance. +For example, if you write many resources and policies in the same yaml and apply together, some resources may be distributed and some will not. + +2)If a user's application implicitly or logically depends on a resource, when the user makes changes to the application, +the dependent resource also needs to be changed to ensure that the application is distributed as expected. + +3)In the lazy activation feature, the modification of a Policy take effect when the referenced resource template changes. +However, some system plug-ins (such as CronHPA) may also cause resource template changes. +This type of system plug-in is also configured by the user and cannot be distinguished from the direct operation of the user, +so it is also regarded as a user behavior, and the changes to the resource template caused by it will also respond to the modification of the policy. + +## Design Details + +### API change + +```go +// PropagationSpec represents the desired behavior of PropagationPolicy. +type PropagationSpec struct { + ...... + // ActivationPreference indicates how the referencing resource template will + // be propagated, in case of policy changes. + // + // If empty, the resource template will respond to policy changes + // immediately, in other words, any policy changes will drive the resource + // template to be propagated immediately as per the current propagation rules. + // + // If the value is 'Lazy' means the policy changes will not take effect for now + // but defer to the resource template changes, in other words, the resource + // template will not be propagated as per the current propagation rules until + // there is an update on it. + // This is an experimental feature that might help in a scenario where a policy + // manages huge amount of resource templates, changes to a policy typically + // affect numerous applications simultaneously. A minor misconfiguration + // could lead to widespread failures. With this feature, the change can be + // gradually rolled out through iterative modifications of resource templates. + // + // +kubebuilder:validation:Enum=Lazy + // +optional + ActivationPreference ActivationPreference `json:"activationPreference,omitempty"` + ...... +} + +// ActivationPreference indicates how the referencing resource template will be propagated, in case of policy changes. +type ActivationPreference string + +const ( + // LazyActivation means the policy changes will not take effect for now but defer to the resource template changes, + // in other words, the resource template will not be propagated as per the current propagation rules until + // there is an update on it. + LazyActivation ActivationPreference = "Lazy" +) +``` + +### System Behavior Description + +The system behavior can be concluded as one-sentence summary: + +* **When a user creates/deletes/modifies a Policy, if the currently active Policy is `Lazy`, it will not take effect immediately.** + +In order to clearly distinguish `currently active Policy` (e.g: the resource change from binding one policy to another, +both have `ActivationPreference` field, which shall prevail), a guideline is introduced: + +* **When Policy changes, modifications involving the binding relationship between Policy and resources are always processed immediately. + Only changes involving the propagation status of resources can be `Lazy`.** + +> That means if you modified fields like `resourceSelector/priority/preemption`, +> this part would be processed immediately, and thus adjust the relationship between Policy and related resources. +> If you modified fields like `placement/conflictResolution/failover/propagateDeps`, +> this part may be delayed to synchronize to related ResourceBinding if the activationPreference is set `Lazy`. + +Then we discuss the `Lazy` behavior case by case: + +* If Policy created first, once the Resource is created, it will be propagated by this Policy immediately. +* If Resource created first, then the Policy is created/modified/deleted, there will be four cases as follows. + +#### Case 1: Policy still match + +**Scenario:** the resource was originally managed by this Policy and are still managed by this Policy. + +**Default behavior:** trigger the resource `reconcile`, and **perform the refresh of the binding**. + +**Lazy behavior:** trigger the resource `reconcile`, but **does not perform the refresh of the binding**. + +#### Case 2: Policy no longer match or deleted + +**Scenario:** the resource was originally managed by this Policy, but now the `resourceSelector` is changed so that +it no longer matches the resource or this Policy is just deleted. + +**Default behavior:** the resource is unbound from this Policy, triggers its `reconcile`, and look for other matchable Policy. +If not found, keep unchanged; otherwise bind to new Policy and **perform the refresh of the binding**. + +**Lazy behavior:** the resource is unbound from this Policy, triggers its `reconcile`, and look for new matchable Policy. +If not found, keep unchanged; otherwise bind to new Policy and **not perform the refresh of the binding if new Policy is `Lazy`**. + +#### Case 3: Policy preemption + +**Scenario:** the resource was originally managed by another policy, but now this policy takes over and manages it. + +**Default behavior:** the resource is unbound from the original policy and bound to this policy, triggers its `reconcile`, and **perform the refresh of the binding**. + +**Lazy behavior:** the resource is unbound from the original policy and bound to this policy, triggers its `reconcile`, but **does not perform the refresh of the binding**. + +#### Case 4: Policy hit + +**Scenario:** the resource was not originally bound to any policy, but is now hit and managed by this policy. + +**Default behavior:** Policy binds the hit resource, triggers its `reconcile`, and **perform the refresh of the binding**. + +**Lazy behavior:** Policy binds the hit resource, triggers its `reconcile`, but **does not perform the refresh of the binding**. + +## Test Plan + +### Simple Case 1 (Policy created before resource) + +```mermaid +sequenceDiagram + participant nginx + participant PP + participant Karmada + + PP ->> Karmada: create PP (match nginx, cluster=member1, lazy) + activate Karmada + nginx ->> Karmada: create nginx + Karmada -->> nginx: propagate it to member1 + deactivate Karmada + + PP ->> Karmada: delete PP + activate Karmada + Karmada -->> nginx: propagation unchanged + deactivate Karmada +``` + +### Simple Case 2 (Policy created after resource) + +```mermaid +sequenceDiagram + participant nginx + participant PP + participant Karmada + + nginx ->> Karmada: create nginx + activate Karmada + PP ->> Karmada: create PP (match nginx, cluster=member1, lazy) + Karmada -->> nginx: no propagation + deactivate Karmada + + nginx ->> Karmada: update nginx (e.g. update a label: refresh-time=$(date +%s)) + activate Karmada + Karmada -->> nginx: propagate it to member1 + deactivate Karmada +``` + +### Simple Case 3 (Lazy to immediate) + +```mermaid +sequenceDiagram + participant nginx + participant PP + participant Karmada + + PP ->> Karmada: create PP (match nginx, cluster=member1, lazy) + activate Karmada + nginx ->> Karmada: create nginx + Karmada -->> nginx: propagate it to member1 + deactivate Karmada + + PP ->> Karmada: update PP (cluster=member2, remove lazy activationPreference field) + activate Karmada + Karmada -->> nginx: propagate it to member2 + deactivate Karmada +``` + +### Simple Case 4 (Immediate to lazy) + +```mermaid +sequenceDiagram + participant nginx + participant PP + participant Karmada + + PP ->> Karmada: create PP (match nginx, cluster=member1, not lazy) + activate Karmada + nginx ->> Karmada: create nginx + Karmada -->> nginx: propagate it to member1 + deactivate Karmada + + PP ->> Karmada: update PP (cluster=member2, activationPreference=lazy) + activate Karmada + Karmada -->> nginx: propagation unchanged + deactivate Karmada + + nginx ->> Karmada: update nginx (e.g. update a label: refresh-time=$(date +%s)) + activate Karmada + Karmada -->> nginx: propagate it to member2 + deactivate Karmada +``` + +### Combined Case 1 (Policy deleted) + +```mermaid +sequenceDiagram + participant nginx + participant PP1 + participant PP2 + participant Karmada + + PP1 ->> Karmada: create PP1 (match nginx, cluster=member1, lazy) + activate Karmada + nginx ->> Karmada: create nginx + Karmada -->> nginx: propagate it to member1 + deactivate Karmada + + PP2 ->> Karmada: create PP2 (match nginx, cluster=member2, lazy) + activate Karmada + PP1 ->> Karmada: delete PP1 + Karmada -->> nginx: label of policy name changed, but propagation unchanged + nginx ->> Karmada: update nginx (e.g. update a label: refresh-time=$(date +%s)) + Karmada -->> nginx: propagate it to member2 + deactivate Karmada +``` + +### Combined Case 2 (Policy no longer match) + +```mermaid +sequenceDiagram + participant nginx + participant CPP1 + participant CPP2 + participant Karmada + + CPP1 ->> Karmada: create CPP1 (match nginx, cluster=member1, lazy) + activate Karmada + nginx ->> Karmada: create nginx + Karmada -->> nginx: propagate it to member1 + deactivate Karmada + + CPP2 ->> Karmada: create CPP2 (match nginx, cluster=member2, lazy) + activate Karmada + CPP1 ->> Karmada: update CPP1 (no longer match nginx) + Karmada -->> nginx: label of policy name changed, but propagation unchanged + nginx ->> Karmada: update nginx (e.g. update a label: refresh-time=$(date +%s)) + Karmada -->> nginx: propagate it to member2 + deactivate Karmada +``` + +### Combined Case 3 (Policy preemption) + +```mermaid +sequenceDiagram + participant nginx + participant PP2 + participant CPP1 + participant Karmada + + CPP1 ->> Karmada: create CPP1 (match nginx, cluster=member1, lazy) + activate Karmada + nginx ->> Karmada: create nginx + Karmada -->> nginx: propagate it to member1 + deactivate Karmada + + PP2 ->> Karmada: create PP2 (match nginx, cluster=member2, not lazy, priority=2, preemption=true) + activate Karmada + Karmada -->> nginx: propagate it to member2 + deactivate Karmada +``` + +### Combined Case 4 (Policy preemption) + +```mermaid +sequenceDiagram + participant nginx + participant PP2 + participant PP1 + participant Karmada + + PP1 ->> Karmada: create PP1 (match nginx, cluster=member1, lazy) + activate Karmada + nginx ->> Karmada: create nginx + Karmada -->> nginx: propagate it to member1 + deactivate Karmada + + PP2 ->> Karmada: create PP2 (match nginx, cluster=member2, lazy, priority=2, preemption=true) + activate Karmada + Karmada -->> nginx: label of policy name changed, but propagation unchanged + nginx ->> Karmada: update nginx (e.g. update a label: refresh-time=$(date +%s)) + Karmada -->> nginx: propagate it to member2 + deactivate Karmada +``` + +### Combined Case 5 (Propagate dependencies) + +```mermaid +sequenceDiagram + participant nginx + participant PP + participant configmap + participant Karmada + + PP ->> Karmada: create PP (match nginx, cluster=member1, lazy, propagateDeps=true) + activate Karmada + configmap ->> Karmada: create configmap + Karmada -->> configmap: no propagation + nginx ->> Karmada: create nginx + Karmada -->> nginx: propagate it to member1 + Karmada -->> configmap: propagate it to member1 + deactivate Karmada + + PP ->> Karmada: update PP (change cluster to member2) + activate Karmada + Karmada -->> nginx: propagation unchanged + Karmada -->> configmap: propagation unchanged + nginx ->> Karmada: update nginx (e.g. update a label: refresh-time=$(date +%s)) + Karmada -->> nginx: propagate it to member2 + Karmada -->> configmap: propagate it to member2 + deactivate Karmada +``` + +### Combined Case 6 (Propagate dependencies) + +```mermaid +sequenceDiagram + participant nginx + participant PP + participant configmap + participant Karmada + + nginx ->> Karmada: create nginx + activate Karmada + configmap ->> Karmada: create configmap + PP ->> Karmada: create PP (match nginx and configmap, cluster=member1, lazy, propagateDeps=false) + Karmada -->> nginx: no propagation + Karmada -->> configmap: no propagation + deactivate Karmada + + nginx ->> Karmada: update nginx (e.g. update a label: refresh-time=$(date +%s)) + activate Karmada + Karmada -->> nginx: propagate it to member1 + Karmada -->> configmap: no propagation + deactivate Karmada + + PP ->> Karmada: update PP (change cluster to member2, propagateDeps to true) + activate Karmada + Karmada -->> nginx: propagation unchanged + Karmada -->> configmap: no propagation + deactivate Karmada + + nginx ->> Karmada: update nginx (e.g. update a label: refresh-time=$(date +%s)) + activate Karmada + Karmada -->> nginx: propagate it to member2 + Karmada -->> configmap: propagate it to member2 + deactivate Karmada +```