autoscaler

Commit Graph

Author	SHA1	Message	Date
elmiko	003e6cd67c	make DecreaseTargetSize more accurate for clusterapi this change ensures that when DecreaseTargetSize is counting the nodes that it does not include any instances which are considered to be pending (i.e. not having a node ref), deleting, or are failed. this change will allow the core autoscaler to then decrease the size of the node group accordingly, instead of raising an error. This change also add some code to the unit tests to make detection of this condition easier.	2025-03-17 19:34:07 -04:00
Bartłomiej Wróblewski	3b47908e51	Add ForceDeleteNodes method to NodeGroup interface	2024-11-18 13:55:07 +00:00
Kuba Tużnik	879c6a84a4	DRA: migrate all of CA to use the new internal NodeInfo/PodInfo The new wrapper types should behave like the direct schedulerframework types for most purposes, so most of the migration is just changing the imported package. Constructors look a bit different, so they have to be adapted - mostly in test code. Accesses to the Pods field have to be changed to a method call. After this, the schedulerframework types are only used in the new wrappers, and in the parts of simulator/ that directly interact with the scheduler framework. The rest of CA codebase operates on the new wrapper types.	2024-11-05 16:43:43 +01:00
elmiko	b4fda55266	cleanup clusterapi imports this change fixes the import order so that the goimports tool does not complain about the ordering.	2024-10-14 09:43:12 -04:00
Michael Weibel	824c108853	feat(clusterapi): per nodeGroup autoscaling options	2024-04-22 15:42:08 +02:00
Kubernetes Prow Robot	c16e0cd47b	Merge pull request #6628 from MaxFedotov/issues/6549 [clusterapi] Do not skip nodegroups with minSize=maxSize	2024-03-19 07:35:51 -07:00
Yaroslava Serdiuk	9cdced4cfd	Add AtomicScaleUp method to NodeGroup interface	2024-03-14 12:18:28 +00:00
Max Fedotov	30ebf5c7ac	[clusterapi] Do not skip nodegroups with minSize=maxSize	2024-03-13 16:02:47 +01:00
aleskandro	54d3a4c714	ClusterAPI: Allow overriding the kubernetes.io/arch label set by the scale from zero method via environment variable The architecture label in the build generic labels method of the cluster API (CAPI) provider is now populated using the GetDefaultScaleFromZeroArchitecture().Name() method. The method allows CAPI users deploying the cluster-autoscaler to define the default architecture to be used by the cluster-autoscaler for scale up from zero via the env var CAPI_SCALE_ZERO_DEFAULT_ARCH. Amd64 is kept as a fallback for historical reasons. The introduced changes will not take into account the case of nodes heterogeneous in architecture. The labels generation to infer properties like the cpu architecture from the node groups' features should be considered as a CAPI provider specific implementation.	2023-10-19 19:43:52 +02:00
aleskandro	398ffaf82f	Fix the buildTemplateLabels method for the ClusterApi provider The joinStringMaps call in the buildTemplateLabels method of the clusterApi provider should not overwrite any custom labels with the generic ones returned by buildGenericLabels()	2023-04-26 10:37:13 +02:00
Matt Boersma	17d2bd968e	[clusterapi] Add support for MachinePools squash	2023-04-12 09:22:49 -06:00
Paco Xu	8dec2025f8	Stop applying the beta.kubernetes.io/os and arch	2022-10-27 12:20:04 +08:00
Michael McCune	1a65fde540	cleanup clusterapi scale from zero implementation This commit is a combination of several commits. Significant details are preserved below. * update functions for resource annotations This change converts some of the functions that look at annotation for resource usage to indicate their usage in the function name. This helps to make room for allowing the infrastructure reference as an alternate source for the capacity information. * migrate capacity logic into a single function This change moves the logic to collect the instance capacity from the TemplateNodeInfo function into a method of the unstructuredScalableResource named InstanceCapacity. This new function is created to house the logic that will decide between annotations and the infrastructure reference when calculating the capacity for the node. * add ability to lookup infrastructure references This change supplements the annotation lookups by adding the logic to read the infrastructure reference if it exists. This is done to determine if the machine template exposes a capacity field in its status. For more information on how this mechanism works, please see the cluster-api enhancement[0]. * add documentation for capi scaling from zero * improve tests for clusterapi scale from zero this change adds functionality to test the dynamic client behavior of getting the infrastructure machine templates. * update README with information about rbac changes this adds more information about the rbac changes necessary for the scale from zero support to work. * remove extra check for scaling from zero since the CanScaleFromZero function checks to see if both CPU and memory are present, there is no need to check a second time. This also adds some documentation to the CanScaleFromZero function to make it clearer what is happening. * update unit test for capi scale from zero adding a few more cases and details to the scale from zero unit tests, including ensuring that the int based annotations do not accept other unit types. [0] https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20210310-opt-in-autoscaling-from-zero.md	2022-07-22 20:21:32 -04:00
Andrew McDermott	de90a462c7	Implement scale from zero for clusterapi This allows a Machine{Set,Deployment} to scale up/down from 0, providing the following annotations are set: ```yaml apiVersion: v1 items: - apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: annotations: machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "0" machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "6" machine.openshift.io/vCPU: "2" machine.openshift.io/memoryMb: 8G machine.openshift.io/GPU: "1" machine.openshift.io/maxPods: "100" ``` Note that `machine.openshift.io/GPU` and `machine.openshift.io/maxPods` are optional. For autoscaling from zero, the autoscaler should convert the mem value received in the appropriate annotation to bytes using powers of two consistently with other providers and fail if the format received is not expected. This gives robust behaviour consistent with cloud providers APIs and providers implementations. https://cloud.google.com/compute/all-pricing https://www.iec.ch/si/binary.htm https://github.com/openshift/kubernetes-autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/aws_manager.go#L366 Co-authored-by: Enxebre <alberto.garcial@hotmail.com> Co-authored-by: Joel Speed <joel.speed@hotmail.co.uk> Co-authored-by: Michael McCune <elmiko@redhat.com>	2022-07-18 13:50:25 -04:00
enxebre	b2f1823c91	Get capi targetsize from cache This ensured that access to replicas during scale down operations were never stale by accessing the API server https://github.com/kubernetes/autoscaler/issues/3104. This honoured that behaviour while moving to unstructured client https://github.com/kubernetes/autoscaler/pull/3312. This regressed that behaviour while trying to reduce the API server load https://github.com/kubernetes/autoscaler/pull/4443. This put back the never stale replicas behaviour at the cost of loading back the API server https://github.com/kubernetes/autoscaler/pull/4634. Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource. This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache. Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.	2022-07-13 20:26:44 +02:00
GuyTempleton	b7b5df50ca	CA - Update gofmt of CAPI_nodegroup.go	2021-11-14 19:41:31 +00:00
Clinton Yeboah	ecfaa6d700	removes deprecated CAPI annotations	2021-11-11 18:56:53 -05:00
Michael McCune	755cb1b7b6	expand CAPI_GROUP usage to cover other capi group variables This change updates the logic for the clusterapi autoscaler provider so that the `CAPI_GROUP` environment variable will also affect the annotations keys for minimum and maximum node group size, the machine annotation, machine deletion, and the cluster name label. It also addes unit tests and an update to the readme.	2021-11-09 16:22:36 -05:00
Maciek Pytel	08d18a7bd0	Define interfaces for per NodeGroup config. This is the first step of implementing https://github.com/kubernetes/autoscaler/issues/3583#issuecomment-743215343. New method was added to cloudprovider interface. All existing providers were updated with a no-op stub implementation that will result in no behavior change. The config values specified per NodeGroup are not yet applied.	2021-01-25 11:00:16 +01:00
Bartłomiej Wróblewski	0fb897b839	Update imports after scheduler scheduler/framework/v1alpha1 removal	2020-11-30 10:48:52 +00:00
Jason DeTiberus	06e5f6a0ed	Update group identifier to use for Cluster API annotations - Also add backwards compatibility for the previously used deprecated annotations	2020-09-21 10:42:46 -04:00
Jason DeTiberus	75b850718f	Add node autodiscovery to cluster-autoscaler clusterapi provider	2020-08-20 16:08:49 -04:00
Jason DeTiberus	18d44fc532	Convert clusterapi provider to use unstructured Remove internal types for Cluster API and replace with unstructured access	2020-07-21 15:49:03 -04:00
Michael McCune	abbb26a93c	Improve delete node mechanisms in cluster-autoscaler CAPI provider This change adds a function to remove the annotations associated with marking a node for deletion. It also adds logic to unmark a node in the event that an error is returned after the node has been annotated but before it has been removed. In the case where a node cannot be removed (eg due to minimum size), the node is unmarked before we return from the error condition.	2020-06-03 15:05:58 -04:00
Enxebre	dac1f7d47e	Compare against minSize in deleteNodes() in cluster-autoscaler CAPI provider When calling deleteNodes() we should fail early if the operation could delete nodes below the nodeGroup minSize(). This is one in a series of PR to mitigate kubernetes#3104	2020-06-02 14:48:48 -04:00
Enxebre	9c8b78aa79	Get replicas always from API server for cluster-autoscaler CAPI provider When getting Replicas() the local struct in the scalable resource might be stale. To mitigate possible side effects, we want always get a fresh replicas. This is one in a series of PR to mitigate kubernetes#3104	2020-06-02 14:45:58 -04:00
Michael McCune	f1407a1b50	Add mutex to DeleteNodes in cluster-autoscaler CAPI provider This change adds a mutex to the MachineController structure which is used to gate access to the DeleteNodes function. This is one in a series of PRs to mitigate kubernetes#3104	2020-06-02 13:58:47 -04:00
Kubernetes Prow Robot	0f504d38c5	Merge pull request #3057 from JoelSpeed/external-node-ids CAPI: Do not normalize Node IDs outside of CAPI provider	2020-05-27 07:28:40 -07:00
Jakub Tużnik	73a5cdf928	Address recent breaking changes in scheduler The following things changed in scheduler and needed to be fixed: * NodeInfo was moved to schedulerframework * Some fields on NodeInfo are now exposed directly instead of via getters * NodeInfo.Pods is now a list of schedulerframework.PodInfo, not apiv1.Pod * SharedLister and NodeInfoLister were moved to schedulerframework * PodLister was removed	2020-04-24 17:54:47 +02:00
Joel Speed	5e0126ada5	Do not normalize Node IDs outside of CAPI provider	2020-04-16 10:32:27 +01:00
Andrew McDermott	d9e3197daa	Normalize providerID values We index on providerID but it turns out that those values on node and machine are not always consistent. Some encode region, some do not, for example. This commit normalizes all values through the normalizedProviderString(). To ensure that we catch all places I've introduced a new type and made the find() functions take this new type in lieu of a string. Unit tests have also been adjusted to introduce a 'test:///' prefix on the providerID value to further validate the change. This change allows CAPI to work out-of-the-box, assuming v1alpha2. It's also reasonable to assert that this consistency should be enforced elsewhere and to make this behaviour easily revertable I'm leaving this as a separate commit in this patch series.	2020-03-10 10:59:05 +00:00
Joel Speed	eae1579100	Ensure DeleteNodes doesn't delete a node twice	2020-03-10 10:59:05 +00:00
Enxebre	699c0b83b4	Let Nodes() return the list of all machines The autoscaler expects provider implementations nodeGroups to implement the Nodes() function to return the number of instances belonging to the group regardless of they have become a kubernetes node or not. This information is then used for instance to realise about unregistered nodes `bf3a9fb52e/cluster-autoscaler/clusterstate/clusterstate.go (L307-L311)`	2020-03-10 10:59:05 +00:00
Andrew McDermott	46bb9b4f29	cloudprovider/clusterapi: new provider This adds a new cloudprovider based on the cluster-api project: https://github.com/kubernetes-sigs/cluster-api	2020-03-10 10:59:04 +00:00

34 Commits