Commit Graph

6 Commits

Author SHA1 Message Date
Michael McCune 1a65fde540 cleanup clusterapi scale from zero implementation
This commit is a combination of several commits. Significant details are
preserved below.

* update functions for resource annotations
  This change converts some of the functions that look at annotation for
  resource usage to indicate their usage in the function name. This helps
  to make room for allowing the infrastructure reference as an alternate
  source for the capacity information.

* migrate capacity logic into a single function
  This change moves the logic to collect the instance capacity from the
  TemplateNodeInfo function into a method of the
  unstructuredScalableResource named InstanceCapacity. This new function
  is created to house the logic that will decide between annotations and
  the infrastructure reference when calculating the capacity for the node.

* add ability to lookup infrastructure references
  This change supplements the annotation lookups by adding the logic to
  read the infrastructure reference if it exists. This is done to
  determine if the machine template exposes a capacity field in its
  status. For more information on how this mechanism works, please see the
  cluster-api enhancement[0].

* add documentation for capi scaling from zero

* improve tests for clusterapi scale from zero
  this change adds functionality to test the dynamic client behavior of
  getting the infrastructure machine templates.

* update README with information about rbac changes
  this adds more information about the rbac changes necessary for the
  scale from zero support to work.

* remove extra check for scaling from zero
  since the CanScaleFromZero function checks to see if both CPU and
  memory are present, there is no need to check a second time. This also
  adds some documentation to the CanScaleFromZero function to make it
  clearer what is happening.

* update unit test for capi scale from zero
  adding a few more cases and details to the scale from zero unit tests,
  including ensuring that the int based annotations do not accept other
  unit types.

[0] https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20210310-opt-in-autoscaling-from-zero.md
2022-07-22 20:21:32 -04:00
Andrew McDermott de90a462c7 Implement scale from zero for clusterapi
This allows a Machine{Set,Deployment} to scale up/down from 0,
providing the following annotations are set:

```yaml
apiVersion: v1
items:
- apiVersion: machine.openshift.io/v1beta1
  kind: MachineSet
  metadata:
    annotations:
      machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "0"
      machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "6"
      machine.openshift.io/vCPU: "2"
      machine.openshift.io/memoryMb: 8G
      machine.openshift.io/GPU: "1"
      machine.openshift.io/maxPods: "100"
```

Note that `machine.openshift.io/GPU` and `machine.openshift.io/maxPods`
are optional.

For autoscaling from zero, the autoscaler should convert the mem value
received in the appropriate annotation to bytes using powers of two
consistently with other providers and fail if the format received is not
expected. This gives robust behaviour consistent with cloud providers APIs
and providers implementations.

https://cloud.google.com/compute/all-pricing
https://www.iec.ch/si/binary.htm
https://github.com/openshift/kubernetes-autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L366

Co-authored-by:  Enxebre <alberto.garcial@hotmail.com>
Co-authored-by:  Joel Speed <joel.speed@hotmail.co.uk>
Co-authored-by:  Michael McCune <elmiko@redhat.com>
2022-07-18 13:50:25 -04:00
enxebre b2f1823c91 Get capi targetsize from cache
This ensured that access to replicas during scale down operations were never stale by accessing the API server https://github.com/kubernetes/autoscaler/issues/3104.
This honoured that behaviour while moving to unstructured client https://github.com/kubernetes/autoscaler/pull/3312.
This regressed that behaviour while trying to reduce the API server load https://github.com/kubernetes/autoscaler/pull/4443.
This put back the never stale replicas behaviour at the cost of loading back the API server https://github.com/kubernetes/autoscaler/pull/4634.

Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource.
This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
2022-07-13 20:26:44 +02:00
Alexander Block 897c208ed1 Fix tests 2021-11-04 14:40:10 +01:00
Jason DeTiberus 75b850718f
Add node autodiscovery to cluster-autoscaler clusterapi provider 2020-08-20 16:08:49 -04:00
Jason DeTiberus 18d44fc532
Convert clusterapi provider to use unstructured
Remove internal types for Cluster API and replace with unstructured access
2020-07-21 15:49:03 -04:00