autoscaler

Commit Graph

Author	SHA1	Message	Date
Michael McCune	bb015b26a1	remove unsupported functionality from cluster-api provider this change removes the code for the `Labels` and `Taints` interface functions of the clusterapi provider when scaling from zero. The body of these functions was added erronesouly and the Cluster API community is still deciding on how these values will be expose to the autoscaler. also updates the tests and readme to be more clear about the usage of labels and taints when scaling from zero.	2022-10-14 14:06:57 -04:00
Michael McCune	1a65fde540	cleanup clusterapi scale from zero implementation This commit is a combination of several commits. Significant details are preserved below. * update functions for resource annotations This change converts some of the functions that look at annotation for resource usage to indicate their usage in the function name. This helps to make room for allowing the infrastructure reference as an alternate source for the capacity information. * migrate capacity logic into a single function This change moves the logic to collect the instance capacity from the TemplateNodeInfo function into a method of the unstructuredScalableResource named InstanceCapacity. This new function is created to house the logic that will decide between annotations and the infrastructure reference when calculating the capacity for the node. * add ability to lookup infrastructure references This change supplements the annotation lookups by adding the logic to read the infrastructure reference if it exists. This is done to determine if the machine template exposes a capacity field in its status. For more information on how this mechanism works, please see the cluster-api enhancement[0]. * add documentation for capi scaling from zero * improve tests for clusterapi scale from zero this change adds functionality to test the dynamic client behavior of getting the infrastructure machine templates. * update README with information about rbac changes this adds more information about the rbac changes necessary for the scale from zero support to work. * remove extra check for scaling from zero since the CanScaleFromZero function checks to see if both CPU and memory are present, there is no need to check a second time. This also adds some documentation to the CanScaleFromZero function to make it clearer what is happening. * update unit test for capi scale from zero adding a few more cases and details to the scale from zero unit tests, including ensuring that the int based annotations do not accept other unit types. [0] https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20210310-opt-in-autoscaling-from-zero.md	2022-07-22 20:21:32 -04:00
Andrew McDermott	de90a462c7	Implement scale from zero for clusterapi This allows a Machine{Set,Deployment} to scale up/down from 0, providing the following annotations are set: ```yaml apiVersion: v1 items: - apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: annotations: machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "0" machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "6" machine.openshift.io/vCPU: "2" machine.openshift.io/memoryMb: 8G machine.openshift.io/GPU: "1" machine.openshift.io/maxPods: "100" ``` Note that `machine.openshift.io/GPU` and `machine.openshift.io/maxPods` are optional. For autoscaling from zero, the autoscaler should convert the mem value received in the appropriate annotation to bytes using powers of two consistently with other providers and fail if the format received is not expected. This gives robust behaviour consistent with cloud providers APIs and providers implementations. https://cloud.google.com/compute/all-pricing https://www.iec.ch/si/binary.htm https://github.com/openshift/kubernetes-autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L366 Co-authored-by: Enxebre <alberto.garcial@hotmail.com> Co-authored-by: Joel Speed <joel.speed@hotmail.co.uk> Co-authored-by: Michael McCune <elmiko@redhat.com>	2022-07-18 13:50:25 -04:00
Joel Speed	9f670d4ea8	Ensure ClusterAPI DeleteNodes accounts for out of band changes scale Because the autoscaler assumes it can delete nodes in parallel, it fetches nodegroups for each node in separate go routines and then instructs each nodegroup to delete a single node. Because we don't share the nodegroup across go routines, the cached replica count in the scalableresource can become stale and as such, if the autoscaler attempts to scale down multiple nodes at a time, the cluster api provider only actually removes a single node. To prevent this, we must ensure we have a fresh replica count for every scale down attempt.	2022-01-21 16:08:00 +00:00
Kubernetes Prow Robot	12efcce4c7	Merge pull request #4443 from codablock/fix-rate-limitting [clusterapi] Rely on replica count found in unstructuredScalableResource	2021-12-14 10:45:30 -08:00
Clinton Yeboah	ecfaa6d700	removes deprecated CAPI annotations	2021-11-11 18:56:53 -05:00
Alexander Block	8b21473fc7	[clusterapi] Rely on replica count found in unstructuredScalableResource Instead of retrieving it each time from k8s, which easily causes client-side throttling, which in turn causes each autoscaler run to take multiple seconds even if only a small number of NodeGroups is involved and nothing is to do.	2021-11-04 11:09:27 +01:00
Jason DeTiberus	06e5f6a0ed	Update group identifier to use for Cluster API annotations - Also add backwards compatibility for the previously used deprecated annotations	2020-09-21 10:42:46 -04:00
Jason DeTiberus	75b850718f	Add node autodiscovery to cluster-autoscaler clusterapi provider	2020-08-20 16:08:49 -04:00
Jason DeTiberus	18d44fc532	Convert clusterapi provider to use unstructured Remove internal types for Cluster API and replace with unstructured access	2020-07-21 15:49:03 -04:00

10 Commits