autoscaler

Commit Graph

Author	SHA1	Message	Date
Alessandro Di Stefano	3aedfc9929	Merge `16cfe50486` into `a9292351c3`	2025-09-18 04:45:32 -07:00
Bartłomiej Wróblewski	a366e629ce	Fix unit-tests	2025-09-15 16:16:46 +00:00
Bartłomiej Wróblewski	d26038112d	Update resource api to use v1, add missing DRA methods	2025-09-15 12:33:58 +00:00
aleskandro	16cfe50486	[Tests] Update cluster-api provider to use machineTemplate.status.nodeInfo for architecture-aware autoscale from zero kubernetes-sigs/cluster-api#11962 introduced the nodeInfo field for MachineTemplates. Providers can reconcile this field in the status subresource to inform the autoscaler about the architecture and operating system that the MachineTemplate's nodes will run. Previously, we have been implementing this behavior in the cluster autoscaler by leveraging the labels capacity annotation and, as a fallback, default values set in environment variables at cluster-autoscaler deployment time. With this commit, the cluster autoscaler computes the future architecture of a node with the following priority order: - Labels set in existing nodes for not-autoscale-from-zero cases - Labels set in the labels capacity annotation of machine template, machine set, and machine deployment. - Values in the status.nodeSystemInfo of MachineTemplates - Generic/default labels set in the environment of the cluster autoscaler	2025-09-05 06:58:57 +01:00
Stefan Bueringer	c3257e9d0b	Fix scale to 0 for Cluster API NodePool Signed-off-by: Stefan Büringer buringerst@vmware.com	2025-08-07 06:49:36 +02:00
Tsubasa Watanabe	3fbacf0d0f	cluster-api: node template in scale-from-0-nodes scenario with DRA Modify TemplateNodeInfo() to return the template of ResourceSlice. This is to address the DRA expansion of Cluster Autoscaler, allowing users to set the number of GPUs and DRA driver name by specifying the annotation to NodeGroup provided by cluster-api. Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>	2025-02-12 11:56:04 +09:00
Matt Boersma	6040d29252	[cluster-api] Handle ignored errors	2023-02-28 13:56:29 -07:00
Cameron McAvoy	e63244a136	Use capacity annotations to set labels and taints for cluster api node groups	2023-02-03 11:18:16 -06:00
Cameron McAvoy	e8c5ad5426	clusterapi: Add ephemeral disk capacity annotation to MachineDeployments for scale from zero	2023-01-12 12:53:38 -06:00
Michael McCune	1a65fde540	cleanup clusterapi scale from zero implementation This commit is a combination of several commits. Significant details are preserved below. * update functions for resource annotations This change converts some of the functions that look at annotation for resource usage to indicate their usage in the function name. This helps to make room for allowing the infrastructure reference as an alternate source for the capacity information. * migrate capacity logic into a single function This change moves the logic to collect the instance capacity from the TemplateNodeInfo function into a method of the unstructuredScalableResource named InstanceCapacity. This new function is created to house the logic that will decide between annotations and the infrastructure reference when calculating the capacity for the node. * add ability to lookup infrastructure references This change supplements the annotation lookups by adding the logic to read the infrastructure reference if it exists. This is done to determine if the machine template exposes a capacity field in its status. For more information on how this mechanism works, please see the cluster-api enhancement[0]. * add documentation for capi scaling from zero * improve tests for clusterapi scale from zero this change adds functionality to test the dynamic client behavior of getting the infrastructure machine templates. * update README with information about rbac changes this adds more information about the rbac changes necessary for the scale from zero support to work. * remove extra check for scaling from zero since the CanScaleFromZero function checks to see if both CPU and memory are present, there is no need to check a second time. This also adds some documentation to the CanScaleFromZero function to make it clearer what is happening. * update unit test for capi scale from zero adding a few more cases and details to the scale from zero unit tests, including ensuring that the int based annotations do not accept other unit types. [0] https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20210310-opt-in-autoscaling-from-zero.md	2022-07-22 20:21:32 -04:00
Andrew McDermott	de90a462c7	Implement scale from zero for clusterapi This allows a Machine{Set,Deployment} to scale up/down from 0, providing the following annotations are set: ```yaml apiVersion: v1 items: - apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: annotations: machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "0" machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "6" machine.openshift.io/vCPU: "2" machine.openshift.io/memoryMb: 8G machine.openshift.io/GPU: "1" machine.openshift.io/maxPods: "100" ``` Note that `machine.openshift.io/GPU` and `machine.openshift.io/maxPods` are optional. For autoscaling from zero, the autoscaler should convert the mem value received in the appropriate annotation to bytes using powers of two consistently with other providers and fail if the format received is not expected. This gives robust behaviour consistent with cloud providers APIs and providers implementations. https://cloud.google.com/compute/all-pricing https://www.iec.ch/si/binary.htm https://github.com/openshift/kubernetes-autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L366 Co-authored-by: Enxebre <alberto.garcial@hotmail.com> Co-authored-by: Joel Speed <joel.speed@hotmail.co.uk> Co-authored-by: Michael McCune <elmiko@redhat.com>	2022-07-18 13:50:25 -04:00
enxebre	b2f1823c91	Get capi targetsize from cache This ensured that access to replicas during scale down operations were never stale by accessing the API server https://github.com/kubernetes/autoscaler/issues/3104. This honoured that behaviour while moving to unstructured client https://github.com/kubernetes/autoscaler/pull/3312. This regressed that behaviour while trying to reduce the API server load https://github.com/kubernetes/autoscaler/pull/4443. This put back the never stale replicas behaviour at the cost of loading back the API server https://github.com/kubernetes/autoscaler/pull/4634. Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource. This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache. Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.	2022-07-13 20:26:44 +02:00
Alexander Block	897c208ed1	Fix tests	2021-11-04 14:40:10 +01:00
Jason DeTiberus	75b850718f	Add node autodiscovery to cluster-autoscaler clusterapi provider	2020-08-20 16:08:49 -04:00
Jason DeTiberus	18d44fc532	Convert clusterapi provider to use unstructured Remove internal types for Cluster API and replace with unstructured access	2020-07-21 15:49:03 -04:00

15 Commits