this change removes the code for the `Labels` and `Taints` interface
functions of the clusterapi provider when scaling from zero. The body
of these functions was added erronesouly and the Cluster API community
is still deciding on how these values will be expose to the autoscaler.
also updates the tests and readme to be more clear about the usage of
labels and taints when scaling from zero.
This commit is a combination of several commits. Significant details are
preserved below.
* update functions for resource annotations
This change converts some of the functions that look at annotation for
resource usage to indicate their usage in the function name. This helps
to make room for allowing the infrastructure reference as an alternate
source for the capacity information.
* migrate capacity logic into a single function
This change moves the logic to collect the instance capacity from the
TemplateNodeInfo function into a method of the
unstructuredScalableResource named InstanceCapacity. This new function
is created to house the logic that will decide between annotations and
the infrastructure reference when calculating the capacity for the node.
* add ability to lookup infrastructure references
This change supplements the annotation lookups by adding the logic to
read the infrastructure reference if it exists. This is done to
determine if the machine template exposes a capacity field in its
status. For more information on how this mechanism works, please see the
cluster-api enhancement[0].
* add documentation for capi scaling from zero
* improve tests for clusterapi scale from zero
this change adds functionality to test the dynamic client behavior of
getting the infrastructure machine templates.
* update README with information about rbac changes
this adds more information about the rbac changes necessary for the
scale from zero support to work.
* remove extra check for scaling from zero
since the CanScaleFromZero function checks to see if both CPU and
memory are present, there is no need to check a second time. This also
adds some documentation to the CanScaleFromZero function to make it
clearer what is happening.
* update unit test for capi scale from zero
adding a few more cases and details to the scale from zero unit tests,
including ensuring that the int based annotations do not accept other
unit types.
[0] https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20210310-opt-in-autoscaling-from-zero.md
This allows a Machine{Set,Deployment} to scale up/down from 0,
providing the following annotations are set:
```yaml
apiVersion: v1
items:
- apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
annotations:
machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "0"
machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "6"
machine.openshift.io/vCPU: "2"
machine.openshift.io/memoryMb: 8G
machine.openshift.io/GPU: "1"
machine.openshift.io/maxPods: "100"
```
Note that `machine.openshift.io/GPU` and `machine.openshift.io/maxPods`
are optional.
For autoscaling from zero, the autoscaler should convert the mem value
received in the appropriate annotation to bytes using powers of two
consistently with other providers and fail if the format received is not
expected. This gives robust behaviour consistent with cloud providers APIs
and providers implementations.
https://cloud.google.com/compute/all-pricinghttps://www.iec.ch/si/binary.htmhttps://github.com/openshift/kubernetes-autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L366
Co-authored-by: Enxebre <alberto.garcial@hotmail.com>
Co-authored-by: Joel Speed <joel.speed@hotmail.co.uk>
Co-authored-by: Michael McCune <elmiko@redhat.com>
Because the autoscaler assumes it can delete nodes in parallel, it
fetches nodegroups for each node in separate go routines and then
instructs each nodegroup to delete a single node.
Because we don't share the nodegroup across go routines, the cached
replica count in the scalableresource can become stale and as such, if
the autoscaler attempts to scale down multiple nodes at a time, the
cluster api provider only actually removes a single node.
To prevent this, we must ensure we have a fresh replica count for every
scale down attempt.
Instead of retrieving it each time from k8s, which easily causes client-side
throttling, which in turn causes each autoscaler run to take multiple
seconds even if only a small number of NodeGroups is involved and nothing
is to do.