Proposal of introducing a rebalance mechanism to actively trigger rescheduling of resource.
Signed-off-by: chaosi-zju <chaosi@zju.edu.cn>
This commit is contained in:
parent
5bc8c5464d
commit
0e1922cb95
|
@ -0,0 +1,455 @@
|
||||||
|
---
|
||||||
|
title: Introduce a rebalance mechanism to actively trigger rescheduling of resource.
|
||||||
|
authors:
|
||||||
|
- "@chaosi-zju"
|
||||||
|
reviewers:
|
||||||
|
- "@RainbowMango"
|
||||||
|
- "@chaunceyjiang"
|
||||||
|
- "@wu0407"
|
||||||
|
approvers:
|
||||||
|
- "@RainbowMango"
|
||||||
|
|
||||||
|
creation-date: 2024-01-30
|
||||||
|
---
|
||||||
|
|
||||||
|
# Introduce a mechanism to actively trigger rescheduling
|
||||||
|
|
||||||
|
## Background
|
||||||
|
|
||||||
|
According to current karmada scheduler, after replicas of workloads is scheduled, it will keep the scheduling result inert
|
||||||
|
and the replicas distribution will not change. Even if reschedule is triggered by modifying replicas or placement,
|
||||||
|
it will maintain the exist replicas distribution as closely as possible, only making minimal adjustments when necessary,
|
||||||
|
which minimizes disruptions and preserves the balance across clusters.
|
||||||
|
|
||||||
|
However, in some scenarios, users hope to have approach to actively trigger a fresh rescheduling, which disregards the
|
||||||
|
previous assignment entirely and seeks to establish an entirely new replica distribution across clusters.
|
||||||
|
|
||||||
|
### Motivation
|
||||||
|
|
||||||
|
Assuming the user has propagated the workloads to member clusters, in some scenarios the current replicas distribution
|
||||||
|
is not the most expected, such as:
|
||||||
|
|
||||||
|
* replicas migrated due to cluster failover, while now cluster recovered.
|
||||||
|
* replicas migrated due to application-level failover, while now each cluster has sufficient resources to run the replicas.
|
||||||
|
* as for `Aggregated` schedule strategy, replicas were initially distributed across multiple clusters due to resource
|
||||||
|
constraints, but now one cluster is enough to accommodate all replicas.
|
||||||
|
|
||||||
|
Therefore, the user desires for an approach to trigger rescheduling so that the replicas distribution can do a rebalance.
|
||||||
|
|
||||||
|
### Goals
|
||||||
|
|
||||||
|
Introduce a rebalance mechanism to actively trigger rescheduling of resource.
|
||||||
|
|
||||||
|
## Proposal
|
||||||
|
|
||||||
|
* **Introduce a configurable field into resource binding, and when it changes, the scheduler will perform a `Fresh` mode
|
||||||
|
rescheduling.**
|
||||||
|
|
||||||
|
> In contrast to existing assignment mode of rescheduling, such as those triggered by modification of replicas or
|
||||||
|
> placement, will maintain the exist replicas distribution as closely as possible, the assignment mode of this rescheduling
|
||||||
|
> disregards the previous assignment entirely and seeks to establish an entirely new replica distribution across clusters.
|
||||||
|
>
|
||||||
|
> We call the former assignment as `Steady` mode and the latter as `Fresh` mode.
|
||||||
|
|
||||||
|
* **Introduce a new API, by which the users can actively adjust workload balance.**
|
||||||
|
|
||||||
|
> Since directly manipulating bindings is not the recommended friendly way, it would be better to design a new API
|
||||||
|
> specifically for adjusting workload balance. Currently, it is mainly considered for rescheduling scenario.
|
||||||
|
> In the future, it may continue to expand more workload rebalance scenarios, such as migration, rollback and so on,
|
||||||
|
> with different assignment modes and rolling modes specified.
|
||||||
|
|
||||||
|
### User story
|
||||||
|
|
||||||
|
#### Story 1
|
||||||
|
|
||||||
|
In cluster failover scenario, replicas are distributed in member1 + member2 two clusters, however they would all migrate to
|
||||||
|
member2 cluster if member1 cluster fails.
|
||||||
|
|
||||||
|
As a cluster administrator, I hope the replicas redistribute to two clusters when member1 cluster recovered, so that
|
||||||
|
the resources of the member1 cluster will be re-utilized, also for the sake of high availability.
|
||||||
|
|
||||||
|
#### Story 2
|
||||||
|
|
||||||
|
In application-level failover, low-priority applications may be preempted, resulting in shrinking from multi clusters
|
||||||
|
to single cluster due to cluster resources are in short supply
|
||||||
|
(refer to [Application-level Failover](https://karmada.io/docs/next/userguide/failover/application-failover#why-application-level-failover-is-required)).
|
||||||
|
|
||||||
|
As a user, I hope the replicas of low-priority applications can be redistributed to multi clusters when
|
||||||
|
cluster resources are sufficient to ensure the high availability of application.
|
||||||
|
|
||||||
|
#### Story 3
|
||||||
|
|
||||||
|
In `Aggregated` schedule type, replicas may still distribute across multiple clusters due to resource constraints.
|
||||||
|
|
||||||
|
As a user, I hope the replicas to be redistributed in an aggregated strategy when any cluster has
|
||||||
|
sufficient resource to accommodate all replicas, so that the application better meets actual business requirements.
|
||||||
|
|
||||||
|
|
||||||
|
#### Story 4
|
||||||
|
|
||||||
|
In disaster-recovery scenario, replicas migrated from primary cluster to backup cluster when primary cluster failue.
|
||||||
|
|
||||||
|
As a cluster administrator, I hope that replicas can migrate back when cluster restored, so that:
|
||||||
|
|
||||||
|
1. restore to the disaster-recovery mode to ensure the reliability and stability of the cluster federation.
|
||||||
|
2. save the cost of the backup cluster.
|
||||||
|
|
||||||
|
### Notes/Constraints/Caveats
|
||||||
|
|
||||||
|
This ability is limited to triggering workload rebalance, the schedule result will be recalculated according to the
|
||||||
|
`Placement` in the current ResourceBinding. That means:
|
||||||
|
|
||||||
|
* Take [story 1](#story-1) as an example, reschedule happened when cluster recovered, but the new schedule result is not
|
||||||
|
guaranteed to be exactly the same as before the cluster failure, it is only guaranteed that the new schedule result meets
|
||||||
|
current `Placement`.
|
||||||
|
|
||||||
|
* Rebalance is basing on `Placement` in the current ResourceBinding, not PropagationPolicy. So if your activation preference
|
||||||
|
of PropagationPolicy is `Lazy`, the rescheduling is still basing on previous `ResourceBinding` even if the current Policy has been changed.
|
||||||
|
|
||||||
|
## Design Details
|
||||||
|
|
||||||
|
### API change
|
||||||
|
|
||||||
|
* As for *Introduce a configurable field into resource binding*, detail description is as follows:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// ResourceBindingSpec represents the expectation of ResourceBinding.
|
||||||
|
type ResourceBindingSpec struct {
|
||||||
|
...
|
||||||
|
// RescheduleTriggeredAt is a timestamp representing when the referenced resource is triggered rescheduling.
|
||||||
|
// When this field is updated, it means a rescheduling is manually triggered by user, and the expected behavior
|
||||||
|
// of this action is to do a complete recalculation without referring to last scheduling results.
|
||||||
|
// It works with the status.lastScheduledTime field, and only when this timestamp is later than timestamp in
|
||||||
|
// status.lastScheduledTime will the rescheduling actually execute, otherwise, ignored.
|
||||||
|
//
|
||||||
|
// It is represented in RFC3339 form (like '2006-01-02T15:04:05Z') and is in UTC.
|
||||||
|
// +optional
|
||||||
|
RescheduleTriggeredAt *metav1.Time `json:"rescheduleTriggeredAt,omitempty"`
|
||||||
|
...
|
||||||
|
}
|
||||||
|
|
||||||
|
// ResourceBindingStatus represents the overall status of the strategy as well as the referenced resources.
|
||||||
|
type ResourceBindingStatus struct {
|
||||||
|
...
|
||||||
|
// LastScheduledTime representing the latest timestamp when scheduler successfully finished a scheduling.
|
||||||
|
// It is represented in RFC3339 form (like '2006-01-02T15:04:05Z') and is in UTC.
|
||||||
|
// +optional
|
||||||
|
LastScheduledTime *metav1.Time `json:"lastScheduledTime,omitempty"`
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
* As for *Introduce a new API, by which the users can actively adjust workload balance.*, we define a new API
|
||||||
|
named `WorkloadRebalancer` into a new apiGroup `apps.karmada.io/v1alpha1`:
|
||||||
|
|
||||||
|
```go
|
||||||
|
// +genclient
|
||||||
|
// +genclient:nonNamespaced
|
||||||
|
// +kubebuilder:resource:path=workloadrebalancers,scope="Cluster"
|
||||||
|
// +kubebuilder:subresource:status
|
||||||
|
// +kubebuilder:storageversion
|
||||||
|
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
|
||||||
|
|
||||||
|
// WorkloadRebalancer represents the desired behavior and status of a job which can enforces a resource rebalance.
|
||||||
|
type WorkloadRebalancer struct {
|
||||||
|
metav1.TypeMeta `json:",inline"`
|
||||||
|
metav1.ObjectMeta `json:"metadata,omitempty"`
|
||||||
|
|
||||||
|
// Spec represents the specification of the desired behavior of WorkloadRebalancer.
|
||||||
|
// +required
|
||||||
|
Spec WorkloadRebalancerSpec `json:"spec"`
|
||||||
|
|
||||||
|
// Status represents the status of WorkloadRebalancer.
|
||||||
|
// +optional
|
||||||
|
Status WorkloadRebalancerStatus `json:"status,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// WorkloadRebalancerSpec represents the specification of the desired behavior of Reschedule.
|
||||||
|
type WorkloadRebalancerSpec struct {
|
||||||
|
// Workloads used to specify the list of expected resource.
|
||||||
|
// Nil or empty list is not allowed.
|
||||||
|
// +kubebuilder:validation:MinItems=1
|
||||||
|
// +required
|
||||||
|
Workloads []ObjectReference `json:"workloads"`
|
||||||
|
|
||||||
|
// TTLSecondsAfterFinished limits the lifetime of a WorkloadRebalancer that has finished execution (means each
|
||||||
|
// target workload is finished with result of Successful or Failed).
|
||||||
|
// If this field is set, ttlSecondsAfterFinished after the WorkloadRebalancer finishes, it is eligible to be automatically deleted.
|
||||||
|
// If this field is unset, the WorkloadRebalancer won't be automatically deleted.
|
||||||
|
// If this field is set to zero, the WorkloadRebalancer becomes eligible to be deleted immediately after it finishes.
|
||||||
|
// +optional
|
||||||
|
TTLSecondsAfterFinished *int32 `json:"ttlSecondsAfterFinished,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// ObjectReference the expected resource.
|
||||||
|
type ObjectReference struct {
|
||||||
|
// APIVersion represents the API version of the target resource.
|
||||||
|
// +required
|
||||||
|
APIVersion string `json:"apiVersion"`
|
||||||
|
|
||||||
|
// Kind represents the Kind of the target resource.
|
||||||
|
// +required
|
||||||
|
Kind string `json:"kind"`
|
||||||
|
|
||||||
|
// Name of the target resource.
|
||||||
|
// +required
|
||||||
|
Name string `json:"name"`
|
||||||
|
|
||||||
|
// Namespace of the target resource.
|
||||||
|
// Default is empty, which means it is a non-namespacescoped resource.
|
||||||
|
// +optional
|
||||||
|
Namespace string `json:"namespace,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// WorkloadRebalancerStatus contains information about the current status of a WorkloadRebalancer
|
||||||
|
// updated periodically by schedule trigger controller.
|
||||||
|
type WorkloadRebalancerStatus struct {
|
||||||
|
// ObservedWorkloads contains information about the execution states and messages of target resources.
|
||||||
|
// +optional
|
||||||
|
ObservedWorkloads []ObservedWorkload `json:"observedWorkloads,omitempty"`
|
||||||
|
|
||||||
|
// ObservedGeneration is the generation(.metadata.generation) observed by the controller.
|
||||||
|
// If ObservedGeneration is less than the generation in metadata means the controller hasn't confirmed
|
||||||
|
// the rebalance result or hasn't done the rebalance yet.
|
||||||
|
// +optional
|
||||||
|
ObservedGeneration int64 `json:"observedGeneration,omitempty"`
|
||||||
|
|
||||||
|
// FinishTime represents the finish time of rebalancer.
|
||||||
|
// +optional
|
||||||
|
FinishTime *metav1.Time `json:"finishTime,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// ObservedWorkload the observed resource.
|
||||||
|
type ObservedWorkload struct {
|
||||||
|
// Workload the observed resource.
|
||||||
|
// +required
|
||||||
|
Workload ObjectReference `json:"workload"`
|
||||||
|
|
||||||
|
// Result the observed rebalance result of resource.
|
||||||
|
// +optional
|
||||||
|
Result RebalanceResult `json:"result,omitempty"`
|
||||||
|
|
||||||
|
// Reason represents a machine-readable description of why this resource rebalanced failed.
|
||||||
|
// +optional
|
||||||
|
Reason RebalanceFailedReason `json:"reason,omitempty"`
|
||||||
|
}
|
||||||
|
|
||||||
|
// RebalanceResult the specific extent to which the resource has been rebalanced
|
||||||
|
type RebalanceResult string
|
||||||
|
|
||||||
|
const (
|
||||||
|
// RebalanceFailed the resource has been rebalance failed.
|
||||||
|
RebalanceFailed RebalanceResult = "Failed"
|
||||||
|
// RebalanceSuccessful the resource has been successfully rebalanced.
|
||||||
|
RebalanceSuccessful RebalanceResult = "Successful"
|
||||||
|
)
|
||||||
|
|
||||||
|
// RebalanceFailedReason represents a machine-readable description of why this resource rebalanced failed.
|
||||||
|
type RebalanceFailedReason string
|
||||||
|
|
||||||
|
const (
|
||||||
|
// RebalanceObjectNotFound the resource referenced binding not found.
|
||||||
|
RebalanceObjectNotFound RebalanceFailedReason = "ReferencedBindingNotFound"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Interpretation of Realization by an Example
|
||||||
|
|
||||||
|
#### Step 1. apply WorkloadRebalancer resource yaml.
|
||||||
|
|
||||||
|
Assuming there is two Deployment named `demo-deploy-1` and `demo-deploy-2`, and a ClusterRole named `demo-role`,
|
||||||
|
the user wants to trigger their rescheduling, he just needs to apply following yaml:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: apps.karmada.io/v1alpha1
|
||||||
|
kind: WorkloadRebalancer
|
||||||
|
metadata:
|
||||||
|
name: demo
|
||||||
|
spec:
|
||||||
|
workloads:
|
||||||
|
- apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
name: demo-deploy-1
|
||||||
|
namespace: default
|
||||||
|
- apiVersion: rbac.authorization.k8s.io/v1
|
||||||
|
kind: ClusterRole
|
||||||
|
name: demo-role
|
||||||
|
- apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
name: demo-deploy-2
|
||||||
|
namespace: default
|
||||||
|
```
|
||||||
|
|
||||||
|
> Notes: as for `workloads` field:
|
||||||
|
> 1. `name` sub-field is required;
|
||||||
|
> 2. `namespace` sub-field is required when it is a namespace scoped resource, while empty when it is a cluster wide
|
||||||
|
resource;
|
||||||
|
|
||||||
|
This API specified a batch of resources which needs a rescheduling, and the user will get a `workloadrebalancer.apps.karmada.io/demo created`
|
||||||
|
result, which means the API created success.
|
||||||
|
|
||||||
|
#### Step 2: Controller listens new API resource and do the rescheduling work.
|
||||||
|
|
||||||
|
Then the controller will work to trigger the rescheduling for each resource, by writing the `CreationTimestamp` of WorkloadRebalancer
|
||||||
|
to each resource binding's new field `spec.placement.rescheduleTriggeredAt`. Take `deployment/demo-deploy-1` as example,
|
||||||
|
you will see its resource binding be modified to:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: work.karmada.io/v1alpha2
|
||||||
|
kind: ResourceBinding
|
||||||
|
metadata:
|
||||||
|
name: demo-deploy-1-deployment
|
||||||
|
namespace: default
|
||||||
|
spec:
|
||||||
|
rescheduleTriggeredAt: "2024-04-17T15:04:05Z" # this field would be updated to CreationTimestamp of WorkloadRebalancer
|
||||||
|
...
|
||||||
|
status:
|
||||||
|
lastScheduledTime: "2024-04-17T15:00:05Z"
|
||||||
|
```
|
||||||
|
|
||||||
|
Since field `rescheduleTriggeredAt` updated, and it is later than field `lastScheduledTime`, rescheduling is triggered.
|
||||||
|
If it succeeds, the `lastScheduledTime` field will be updated again, which represents scheduler finished a rescheduling
|
||||||
|
(if failed, the scheduler will retry), detail as follows:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: work.karmada.io/v1alpha2
|
||||||
|
kind: ResourceBinding
|
||||||
|
metadata:
|
||||||
|
name: demo-deploy-1-deployment
|
||||||
|
namespace: default
|
||||||
|
spec:
|
||||||
|
rescheduleTriggeredAt: "2024-04-17T15:04:05Z"
|
||||||
|
...
|
||||||
|
status:
|
||||||
|
lastScheduledTime: "2024-04-17T15:04:05Z"
|
||||||
|
conditions:
|
||||||
|
- ...
|
||||||
|
- lastTransitionTime: "2024-04-17T15:00:05Z"
|
||||||
|
message: Binding has been scheduled successfully.
|
||||||
|
reason: Success
|
||||||
|
status: "True"
|
||||||
|
type: Scheduled
|
||||||
|
- lastTransitionTime: "2024-04-17T15:04:05Z"
|
||||||
|
message: All works have been successfully applied
|
||||||
|
reason: FullyAppliedSuccess
|
||||||
|
status: "True"
|
||||||
|
type: FullyApplied
|
||||||
|
```
|
||||||
|
|
||||||
|
Finally, all works have been successfully applied, the user will observe changes in the actual distribution of resource
|
||||||
|
template; the user can also see several recorded event in resource template, just like:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
$ kubectl --context karmada-apiserver describe deployment demo-deploy-1
|
||||||
|
...
|
||||||
|
Events:
|
||||||
|
Type Reason Age From Message
|
||||||
|
---- ------ ---- ---- -------
|
||||||
|
...
|
||||||
|
Normal ScheduleBindingSucceed 31s default-scheduler Binding has been scheduled successfully. Result: {member2:2, member1:1}
|
||||||
|
Normal GetDependenciesSucceed 31s dependencies-distributor Get dependencies([]) succeed.
|
||||||
|
Normal SyncSucceed 31s execution-controller Successfully applied resource(default/demo-deploy-1) to cluster member1
|
||||||
|
Normal AggregateStatusSucceed 31s (x4 over 31s) resource-binding-status-controller Update resourceBinding(default/demo-deploy-1-deployment) with AggregatedStatus successfully.
|
||||||
|
Normal SyncSucceed 31s execution-controller Successfully applied resource(default/demo-deploy-1) to cluster member2
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Step 3: check the status of WorkloadRebalancer.
|
||||||
|
|
||||||
|
The user can observe the rebalance result at `status.observedWorkloads` of `workloadrebalancer/demo`, just like:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: apps.karmada.io/v1alpha1
|
||||||
|
kind: WorkloadRebalancer
|
||||||
|
metadata:
|
||||||
|
creationTimestamp: "2024-04-17T15:04:05Z"
|
||||||
|
name: demo
|
||||||
|
spec:
|
||||||
|
workloads:
|
||||||
|
- apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
name: demo-deploy-1
|
||||||
|
namespace: default
|
||||||
|
- apiVersion: rbac.authorization.k8s.io/v1
|
||||||
|
kind: ClusterRole
|
||||||
|
name: demo-role
|
||||||
|
- apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
name: demo-deploy-2
|
||||||
|
namespace: default
|
||||||
|
status:
|
||||||
|
observedWorkloads:
|
||||||
|
- result: Successful
|
||||||
|
workload:
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
name: demo-deploy-1
|
||||||
|
namespace: default
|
||||||
|
- reason: ReferencedBindingNotFound
|
||||||
|
result: Failed
|
||||||
|
workload:
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
name: demo-deploy-2
|
||||||
|
namespace: default
|
||||||
|
- result: Successful
|
||||||
|
workload:
|
||||||
|
apiVersion: rbac.authorization.k8s.io/v1
|
||||||
|
kind: ClusterRole
|
||||||
|
name: demo-role
|
||||||
|
```
|
||||||
|
|
||||||
|
> Notes:
|
||||||
|
> 1. the `observedWorkloads` is sorted in increasing dict order of the combined string of `apiVersion/kind/namespace/name` .
|
||||||
|
> 2. if workload referenced binding not found, it will be marked as `failed` without retry.
|
||||||
|
> 3. if workload rebalanced failed due to occasional network error, the controller will retry, and its `result` and `reason`
|
||||||
|
> field will left empty until it succees.
|
||||||
|
|
||||||
|
### How to update this resource
|
||||||
|
|
||||||
|
When `spec` filed of WorkloadRebalancer updated, we shall refresh the workload list in `status.observedWorkloads`:
|
||||||
|
|
||||||
|
* a new workload added to spec list, just add it into status list too and do the rebalance.
|
||||||
|
* a workload deleted from previous spec list, keep it in status list if already success, and remove it if not.
|
||||||
|
* a workload is modified, just regard it as deleted an old one and inserted a new one.
|
||||||
|
* if the modification only involves a list order adjustment, no additional action, since `observedWorkloads` is arranged in increasing dict order.
|
||||||
|
|
||||||
|
### How to auto clean resource
|
||||||
|
|
||||||
|
referring to [Automatic Cleanup for Finished Jobs](https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/).
|
||||||
|
|
||||||
|
Introduces field `ttlSecondsAfterFinished` which limits the lifetime of a WorkloadRebalancer that has finished execution
|
||||||
|
(finished execution means each target workload is finished with result of `Successful` or `Failed`).
|
||||||
|
|
||||||
|
* If this field is set, `ttlSecondsAfterFinished` after the WorkloadRebalancer finishes, it is eligible to be automatically deleted.
|
||||||
|
* If this field is unset, the WorkloadRebalancer won't be automatically deleted.
|
||||||
|
* If this field is set to zero, the WorkloadRebalancer becomes eligible to be deleted immediately after it finishes.
|
||||||
|
|
||||||
|
Considering several corner cases:
|
||||||
|
|
||||||
|
* case 1: if a new target workload was added into `WorkloadRebalancer` before `ttlSecondsAfterFinished` expired,
|
||||||
|
which means the finish time of the `WorkloadRebalancer` is refreshed, so the `delete` action is deferred since expire time is refreshed too.
|
||||||
|
* case 2: if `ttlSecondsAfterFinished` is modified before `ttlSecondsAfterFinished` expired,
|
||||||
|
`delete` action should be performed according to latest `ttlSecondsAfterFinished`.
|
||||||
|
* case 3: when we have got and checked latest `WorkloadRebalancer` object and try to delete it,
|
||||||
|
if a modification to `WorkloadRebalancer` occurred just right between the two time point, the previous `delete` action should be Interrupted.
|
||||||
|
|
||||||
|
Several key implementation:
|
||||||
|
* A `WorkloadRebalancer` is judged as finished should meet two requirements:
|
||||||
|
* all expected workloads are finished with result of `Successful` or `Failed`.
|
||||||
|
* introduce a new field named `ObservedGeneration` to `Status` of WorkloadRebalancer, and it should be equal to `.metadata.Generation`,
|
||||||
|
to prevent that the WorkloadRebalancer is updated but controller hasn't in time refreshed its `Status`.
|
||||||
|
* When `WorkloadRebalancer` is `Created` or `Updated`, add it to the workqueue and calculate its expiring time, and
|
||||||
|
call `workqueue.AddAfter()` function to re-enqueue it once more if it hasn't expired.
|
||||||
|
* Before deleting the `WorkloadRebalancer`, do a final sanity check. Use the latest `WorkloadRebalancer` directly
|
||||||
|
fetched from api server to see if the TTL truly expires, rather than object from lister cache.
|
||||||
|
* When deleting the `WorkloadRebalancer`, it is needed to confirm that the `resourceVersion` of the deleted object is as expected,
|
||||||
|
to prevent from above corner case 3.
|
||||||
|
|
||||||
|
### How to prevent application from being out-of-service
|
||||||
|
|
||||||
|
As for disaster-recovery scenario mentioned in above [story 4](#story-4), after primary cluster recovered and reschedule
|
||||||
|
has been triggered, if new replicas in primary cluster become ready later than old replicas removed from backup cluster,
|
||||||
|
there may be no ready replica in cluster federation and the application will be out-of-service. So, how to prevent
|
||||||
|
application from being out-of-service?
|
||||||
|
|
||||||
|
This will be discussed and implemented separately in another proposal.
|
Loading…
Reference in New Issue