This allows a Machine{Set,Deployment} to scale up/down from 0,
providing the following annotations are set:
```yaml
apiVersion: v1
items:
- apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
annotations:
machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "0"
machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "6"
machine.openshift.io/vCPU: "2"
machine.openshift.io/memoryMb: 8G
machine.openshift.io/GPU: "1"
machine.openshift.io/maxPods: "100"
```
Note that `machine.openshift.io/GPU` and `machine.openshift.io/maxPods`
are optional.
For autoscaling from zero, the autoscaler should convert the mem value
received in the appropriate annotation to bytes using powers of two
consistently with other providers and fail if the format received is not
expected. This gives robust behaviour consistent with cloud providers APIs
and providers implementations.
https://cloud.google.com/compute/all-pricinghttps://www.iec.ch/si/binary.htmhttps://github.com/openshift/kubernetes-autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L366
Co-authored-by: Enxebre <alberto.garcial@hotmail.com>
Co-authored-by: Joel Speed <joel.speed@hotmail.co.uk>
Co-authored-by: Michael McCune <elmiko@redhat.com>
Because the autoscaler assumes it can delete nodes in parallel, it
fetches nodegroups for each node in separate go routines and then
instructs each nodegroup to delete a single node.
Because we don't share the nodegroup across go routines, the cached
replica count in the scalableresource can become stale and as such, if
the autoscaler attempts to scale down multiple nodes at a time, the
cluster api provider only actually removes a single node.
To prevent this, we must ensure we have a fresh replica count for every
scale down attempt.
When getting Replicas() the local struct in the scalable resource might be stale. To mitigate possible side effects, we want always get a fresh replicas.
This is one in a series of PR to mitigate kubernetes#3104
We index on providerID but it turns out that those values on node and
machine are not always consistent. Some encode region, some do not,
for example.
This commit normalizes all values through the normalizedProviderString().
To ensure that we catch all places I've introduced a new type and made
the find() functions take this new type in lieu of a string. Unit
tests have also been adjusted to introduce a 'test:///' prefix on the
providerID value to further validate the change.
This change allows CAPI to work out-of-the-box, assuming v1alpha2.
It's also reasonable to assert that this consistency should be
enforced elsewhere and to make this behaviour easily revertable I'm
leaving this as a separate commit in this patch series.
The autoscaler expects provider implementations nodeGroups to implement the Nodes() function to return the number of instances belonging to the group regardless of they have become a kubernetes node or not.
This information is then used for instance to realise about unregistered nodes bf3a9fb52e/cluster-autoscaler/clusterstate/clusterstate.go (L307-L311)