[Federated hpa design] Add some clarity and details

This commit is contained in:
Irfan Ur Rehman 2017-07-10 19:16:38 +05:30
parent ba53fa0805
commit cf49795946
1 changed files with 61 additions and 20 deletions

View File

@ -48,15 +48,15 @@ scaling of the pods per cluster then is controlled by the associated HPA.
# Alternative approaches
## Alternative 1
## Design Alternative 1
Make the autoscaling resource available and implement support for
horizontalpodautoscalers objects at federation. The HPA API resource
will need to be exposed at the federation level, which can follow the
version similar to one implemented in the latest k8s cluster release.
Once the object is created at federation, the federation controller
creates and monitors a similar object (partitioning the min and max values)
Once the HPA object is created at federation, the federation controller
creates and monitors a similar HPA object (partitioning the min and max values)
in each of the federated clusters. Based on the metadata in spec of the HPA
describing the scaleTargetRef, the HPA will be applied on the already existing
target objects. If the target object is not present in the cluster (either
@ -87,9 +87,11 @@ replicaset object as described
One of the points to note here is that, doing this brings a two point control on
number of replicas of the target object, one by the federated target object (rs or
deployment) and other by the hpa local to the federated cluster. Solution to which
is discussed in the following section.
is discussed in the following section. Another additional note here is that, the
preferences would consider use of only minreplicas and maxreplicas in this phase
of implementation and weights will be discarded for this alternative design.
### Rebalancing (function of target federated objects)
### Rebalancing of workload replicas and control over the same.
The current implementation of federated replicasets (and deployments) first
distributes the replicas into underlying clusters and then monitors the status
@ -108,6 +110,33 @@ and deployment controllers) reconcile process, would stop updating and/or
rebalancing the replicas in and across the underlying clusters. The reconcile
of the objects (rs or deployment) would still continue, to handle the scenario
of the object missing from any given federated cluster.
The mechanism to achieve this behaviour shall be as below:
- User creates a workload object (for example rs) in federation.
- User then creates an HPA object in federation (this step and the previous
step can follow either order of creation).
- The rs as an object will exist in federation control plane with or without
the user preferences and/or cluster selection annotations.
- The HPA controller will first evaluate which cluster(s) get the replicas
and which don't (if any). This list of clusters will be a subset of the
cluster selector already applied on the hpa object.
- The HPA controller will apply this list on the federated rs object as the
cluster selection annotation overriding the user provided preferences (if any).
The control over the placement of workload replicas and the add on preferences
will thus lie completely with the HPA objects. This is an important assumption
that the user of these federated objects interacting with each other should be
aware of; and if the user needs to place replicas in specific clusters, together
with workload autoscaling he/she should apply these preferences on the HPA
object. Any preferences applied on the workload object (rs or deployment) will
be overridden.
- The target workload object (for example rs) replicas will be kept unchanged
in the cluster which already has the replicas, will be created with one replica
if the particular cluster does not have the same and HPA calculation resulted
in some replicas for that cluster and deleted from the clusters which has the
replicas and the federated HPA calculations result in no replicas for that
particular cluster.
- The desired replicas per cluster as per the federated HPA dynamic rebalance
mechanism, elaborated in the next section, will be set on individual clusters
local HPA, which in turn will set the same on the target local object.
### Dynamic HPA min/max rebalance
@ -138,10 +167,10 @@ _CurrentReplicas == DesiredReplicas == MinReplicas_ and the observed average res
metric usage (on the HPA) is lesser then a given threshold, to those clusters,
where the _DesiredReplicas > MinReplicas_.
However, as stated above in 3 above, the approach of distribution will first be implemented
However, as stated in 3 above, the approach of distribution will first be implemented
only for _MaxReplicas_ to establish it utility, before implementing the same for _MinReplicas_.
## Alternative 2
## Design Alternative 2
Same as the previous one, the API will need to be exposed at federation.
@ -185,23 +214,35 @@ what is probably better than not having the replica at all.
# Other Scenario
All other scenario, for example rolling updates (when user updates the deployment or RS),
recreation of the object (when user specifies the strategy as recreate while updating the object),
will continue to be handled the way they are handled in an individual k8s cluster.
At federation the respective controllers will monitor/watch the individual objects and update
reconcile as per the implementation, except the case of rebalance specified in the section
*rebalancing*.
Other scenario, for example rolling updates (when user updates the deployment or RS),
recreation of the object (when user specifies the strategy as recreate while updating
the object), will continue to be handled the way they are handled in an individual k8s
cluster. Additionally there is a shortcoming in the current implementation of the
federated deployments rolling update. There is an existing proposal as part of the
[federated deployment design doc](https://github.com/kubernetes/community/pull/325).
Given it is implemented, the rolling updates for a federated deployment while a
federated HPA is active on the same object will also work fine.
# Conclusion
The design alternative 2 has a drawback, that the monitoring and update of hpa objects (when needed)
for a particular federated cluster would stop if for whatever reasons network link between the
federated cluster and federation control plane is severed. A bigger problem can happen in case of
an outage of the federation control plane altogether.
Alternative 1 would be a better choice.
The design alternative 2 has the following major drawbacks, which are sufficient to
discard it as a probable implementation option:
- This option needs the federation control plane controller to collect metrics
data from each cluster, which is an overhead with increasing gravity of the problem
with increasing number of federated clusters, in a given federation.
- The monitoring and update of objects which are targeted by the federated HPA object
(when needed) for a particular federated cluster would stop if for whatever reasons
the network link between the federated cluster and federation control plane is severed.
A bigger problem can happen in case of an outage of the federation control plane
altogether.
In Design Alternative 1 the autoscaling of replicas will continue, even if a given
cluster gets disconnected from federation or in case of the federation control plane
outage. This would happen because the local HPAs with the last know maxreplica and
minreplicas would exist in the local clusters. Additionally in this alternative there
is no need of collection and processing of the pod metrices for the target object from
each individual cluster.
This document proposes to use ***design alternative 1*** as the preferred implementation.
# Glossary