Commit Graph

18 Commits

Author SHA1 Message Date
Andrew McDermott de90a462c7 Implement scale from zero for clusterapi
This allows a Machine{Set,Deployment} to scale up/down from 0,
providing the following annotations are set:

```yaml
apiVersion: v1
items:
- apiVersion: machine.openshift.io/v1beta1
  kind: MachineSet
  metadata:
    annotations:
      machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "0"
      machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "6"
      machine.openshift.io/vCPU: "2"
      machine.openshift.io/memoryMb: 8G
      machine.openshift.io/GPU: "1"
      machine.openshift.io/maxPods: "100"
```

Note that `machine.openshift.io/GPU` and `machine.openshift.io/maxPods`
are optional.

For autoscaling from zero, the autoscaler should convert the mem value
received in the appropriate annotation to bytes using powers of two
consistently with other providers and fail if the format received is not
expected. This gives robust behaviour consistent with cloud providers APIs
and providers implementations.

https://cloud.google.com/compute/all-pricing
https://www.iec.ch/si/binary.htm
https://github.com/openshift/kubernetes-autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L366

Co-authored-by:  Enxebre <alberto.garcial@hotmail.com>
Co-authored-by:  Joel Speed <joel.speed@hotmail.co.uk>
Co-authored-by:  Michael McCune <elmiko@redhat.com>
2022-07-18 13:50:25 -04:00
enxebre b2f1823c91 Get capi targetsize from cache
This ensured that access to replicas during scale down operations were never stale by accessing the API server https://github.com/kubernetes/autoscaler/issues/3104.
This honoured that behaviour while moving to unstructured client https://github.com/kubernetes/autoscaler/pull/3312.
This regressed that behaviour while trying to reduce the API server load https://github.com/kubernetes/autoscaler/pull/4443.
This put back the never stale replicas behaviour at the cost of loading back the API server https://github.com/kubernetes/autoscaler/pull/4634.

Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource.
This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
2022-07-13 20:26:44 +02:00
Joel Speed 9f670d4ea8
Ensure ClusterAPI DeleteNodes accounts for out of band changes scale
Because the autoscaler assumes it can delete nodes in parallel, it 
fetches nodegroups for each node in separate go routines and then 
instructs each nodegroup to delete a single node.
Because we don't share the nodegroup across go routines, the cached 
replica count in the scalableresource can become stale and as such, if 
the autoscaler attempts to scale down multiple nodes at a time, the 
cluster api provider only actually removes a single node.

To prevent this, we must ensure we have a fresh replica count for every 
scale down attempt.
2022-01-21 16:08:00 +00:00
Kubernetes Prow Robot 12efcce4c7
Merge pull request #4443 from codablock/fix-rate-limitting
[clusterapi] Rely on replica count found in unstructuredScalableResource
2021-12-14 10:45:30 -08:00
Clinton Yeboah ecfaa6d700 removes deprecated CAPI annotations 2021-11-11 18:56:53 -05:00
Alexander Block 897c208ed1 Fix tests 2021-11-04 14:40:10 +01:00
Jason DeTiberus 06e5f6a0ed
Update group identifier to use for Cluster API annotations
- Also add backwards compatibility for the previously used deprecated annotations
2020-09-21 10:42:46 -04:00
Jason DeTiberus 75b850718f
Add node autodiscovery to cluster-autoscaler clusterapi provider 2020-08-20 16:08:49 -04:00
Jason DeTiberus 63f9e40d82
Improve Cluster API tests to work better with constrained resources 2020-08-19 13:31:32 -04:00
Jason DeTiberus 18d44fc532
Convert clusterapi provider to use unstructured
Remove internal types for Cluster API and replace with unstructured access
2020-07-21 15:49:03 -04:00
Joel Speed be6edb4a3e Rewrite DeleteNodesTwice test to check API not TargetSize for
cluster-autoscaler CAPI provider
2020-06-02 14:51:11 -04:00
Enxebre 9c8b78aa79 Get replicas always from API server for cluster-autoscaler CAPI provider
When getting Replicas() the local struct in the scalable resource might be stale. To mitigate possible side effects, we want always get a fresh replicas.

This is one in a series of PR to mitigate kubernetes#3104
2020-06-02 14:45:58 -04:00
Joel Speed 5e0126ada5
Do not normalize Node IDs outside of CAPI provider 2020-04-16 10:32:27 +01:00
Joel Speed d23d3a1dd5
Add testing for fake provider IDs 2020-04-02 15:24:57 +01:00
Andrew McDermott d9e3197daa Normalize providerID values
We index on providerID but it turns out that those values on node and
machine are not always consistent. Some encode region, some do not,
for example.

This commit normalizes all values through the normalizedProviderString().

To ensure that we catch all places I've introduced a new type and made
the find() functions take this new type in lieu of a string. Unit
tests have also been adjusted to introduce a 'test:///' prefix on the
providerID value to further validate the change.

This change allows CAPI to work out-of-the-box, assuming v1alpha2.

It's also reasonable to assert that this consistency should be
enforced elsewhere and to make this behaviour easily revertable I'm
leaving this as a separate commit in this patch series.
2020-03-10 10:59:05 +00:00
Joel Speed eae1579100 Ensure DeleteNodes doesn't delete a node twice 2020-03-10 10:59:05 +00:00
Enxebre 699c0b83b4 Let Nodes() return the list of all machines
The autoscaler expects provider implementations nodeGroups to implement the Nodes() function to return the number of instances belonging to the group regardless of they have become a kubernetes node or not.
This information is then used for instance to realise about unregistered nodes bf3a9fb52e/cluster-autoscaler/clusterstate/clusterstate.go (L307-L311)
2020-03-10 10:59:05 +00:00
Andrew McDermott 46bb9b4f29 cloudprovider/clusterapi: new provider
This adds a new cloudprovider based on the cluster-api project:

  https://github.com/kubernetes-sigs/cluster-api
2020-03-10 10:59:04 +00:00