Commit Graph

10 Commits

Author SHA1 Message Date
Michael McCune bb015b26a1 remove unsupported functionality from cluster-api provider
this change removes the code for the `Labels` and `Taints` interface
functions of the clusterapi provider when scaling from zero. The body
of these functions was added erronesouly and the Cluster API community
is still deciding on how these values will be expose to the autoscaler.

also updates the tests and readme to be more clear about the usage of
labels and taints when scaling from zero.
2022-10-14 14:06:57 -04:00
Michael McCune 1a65fde540 cleanup clusterapi scale from zero implementation
This commit is a combination of several commits. Significant details are
preserved below.

* update functions for resource annotations
  This change converts some of the functions that look at annotation for
  resource usage to indicate their usage in the function name. This helps
  to make room for allowing the infrastructure reference as an alternate
  source for the capacity information.

* migrate capacity logic into a single function
  This change moves the logic to collect the instance capacity from the
  TemplateNodeInfo function into a method of the
  unstructuredScalableResource named InstanceCapacity. This new function
  is created to house the logic that will decide between annotations and
  the infrastructure reference when calculating the capacity for the node.

* add ability to lookup infrastructure references
  This change supplements the annotation lookups by adding the logic to
  read the infrastructure reference if it exists. This is done to
  determine if the machine template exposes a capacity field in its
  status. For more information on how this mechanism works, please see the
  cluster-api enhancement[0].

* add documentation for capi scaling from zero

* improve tests for clusterapi scale from zero
  this change adds functionality to test the dynamic client behavior of
  getting the infrastructure machine templates.

* update README with information about rbac changes
  this adds more information about the rbac changes necessary for the
  scale from zero support to work.

* remove extra check for scaling from zero
  since the CanScaleFromZero function checks to see if both CPU and
  memory are present, there is no need to check a second time. This also
  adds some documentation to the CanScaleFromZero function to make it
  clearer what is happening.

* update unit test for capi scale from zero
  adding a few more cases and details to the scale from zero unit tests,
  including ensuring that the int based annotations do not accept other
  unit types.

[0] https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20210310-opt-in-autoscaling-from-zero.md
2022-07-22 20:21:32 -04:00
Andrew McDermott de90a462c7 Implement scale from zero for clusterapi
This allows a Machine{Set,Deployment} to scale up/down from 0,
providing the following annotations are set:

```yaml
apiVersion: v1
items:
- apiVersion: machine.openshift.io/v1beta1
  kind: MachineSet
  metadata:
    annotations:
      machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "0"
      machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "6"
      machine.openshift.io/vCPU: "2"
      machine.openshift.io/memoryMb: 8G
      machine.openshift.io/GPU: "1"
      machine.openshift.io/maxPods: "100"
```

Note that `machine.openshift.io/GPU` and `machine.openshift.io/maxPods`
are optional.

For autoscaling from zero, the autoscaler should convert the mem value
received in the appropriate annotation to bytes using powers of two
consistently with other providers and fail if the format received is not
expected. This gives robust behaviour consistent with cloud providers APIs
and providers implementations.

https://cloud.google.com/compute/all-pricing
https://www.iec.ch/si/binary.htm
https://github.com/openshift/kubernetes-autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L366

Co-authored-by:  Enxebre <alberto.garcial@hotmail.com>
Co-authored-by:  Joel Speed <joel.speed@hotmail.co.uk>
Co-authored-by:  Michael McCune <elmiko@redhat.com>
2022-07-18 13:50:25 -04:00
Joel Speed 9f670d4ea8
Ensure ClusterAPI DeleteNodes accounts for out of band changes scale
Because the autoscaler assumes it can delete nodes in parallel, it 
fetches nodegroups for each node in separate go routines and then 
instructs each nodegroup to delete a single node.
Because we don't share the nodegroup across go routines, the cached 
replica count in the scalableresource can become stale and as such, if 
the autoscaler attempts to scale down multiple nodes at a time, the 
cluster api provider only actually removes a single node.

To prevent this, we must ensure we have a fresh replica count for every 
scale down attempt.
2022-01-21 16:08:00 +00:00
Kubernetes Prow Robot 12efcce4c7
Merge pull request #4443 from codablock/fix-rate-limitting
[clusterapi] Rely on replica count found in unstructuredScalableResource
2021-12-14 10:45:30 -08:00
Clinton Yeboah ecfaa6d700 removes deprecated CAPI annotations 2021-11-11 18:56:53 -05:00
Alexander Block 8b21473fc7 [clusterapi] Rely on replica count found in unstructuredScalableResource
Instead of retrieving it each time from k8s, which easily causes client-side
throttling, which in turn causes each autoscaler run to take multiple
seconds even if only a small number of NodeGroups is involved and nothing
is to do.
2021-11-04 11:09:27 +01:00
Jason DeTiberus 06e5f6a0ed
Update group identifier to use for Cluster API annotations
- Also add backwards compatibility for the previously used deprecated annotations
2020-09-21 10:42:46 -04:00
Jason DeTiberus 75b850718f
Add node autodiscovery to cluster-autoscaler clusterapi provider 2020-08-20 16:08:49 -04:00
Jason DeTiberus 18d44fc532
Convert clusterapi provider to use unstructured
Remove internal types for Cluster API and replace with unstructured access
2020-07-21 15:49:03 -04:00