autoscaler

Commit Graph

Author	SHA1	Message	Date
aleskandro	16cfe50486	[Tests] Update cluster-api provider to use machineTemplate.status.nodeInfo for architecture-aware autoscale from zero kubernetes-sigs/cluster-api#11962 introduced the nodeInfo field for MachineTemplates. Providers can reconcile this field in the status subresource to inform the autoscaler about the architecture and operating system that the MachineTemplate's nodes will run. Previously, we have been implementing this behavior in the cluster autoscaler by leveraging the labels capacity annotation and, as a fallback, default values set in environment variables at cluster-autoscaler deployment time. With this commit, the cluster autoscaler computes the future architecture of a node with the following priority order: - Labels set in existing nodes for not-autoscale-from-zero cases - Labels set in the labels capacity annotation of machine template, machine set, and machine deployment. - Values in the status.nodeSystemInfo of MachineTemplates - Generic/default labels set in the environment of the cluster autoscaler	2025-09-05 06:58:57 +01:00
Jun Wang	21ca04ae26	Replace capi v1alpha3 with v1beta2 in test cases	2025-08-08 15:16:41 +08:00
Jun Wang	f4c2fdebeb	Add process with apiGroup in capi provider	2025-08-08 15:16:26 +08:00
Kubernetes Prow Robot	1de2160986	Merge pull request #7908 from Preisschild/fix/capi-patch-instead-update CA: Use Patch to Scale clusterapi nodepools	2025-04-03 07:16:48 -07:00
Florian Ströger	ecb572a945	Use Patch to Scale clusterapi nodepools to avoid modification conflicts Issue: https://github.com/kubernetes/autoscaler/issues/7872 Signed-off-by: Florian Ströger <stroeger@youniqx.com>	2025-04-01 08:26:45 +02:00
elmiko	5e1fc195a3	refactor findScalableResourceProviderIDs in clusterapi this change refactors the function so that it each distinct machine state can be filtered more easily. the unit tests have been supplemented, but not changed to ensure that the functionality continues to work as expected. these changes are to help better detect edge cases where machines can be transiting through pending phase and might be removed by the autoscaler.	2025-03-26 12:41:09 -04:00
Jack Francis	7b5e10156e	s/nodeHasValidProviderID/isProviderIDNormalized Signed-off-by: Jack Francis <jackfrancis@gmail.com>	2025-03-19 12:30:33 -07:00
Jack Francis	4aa465764c	capi: node and provider ID accounting funcs Signed-off-by: Jack Francis <jackfrancis@gmail.com>	2025-03-19 11:40:19 -07:00
elmiko	71d3595cb7	improve failed machine detection in clusterapi This change makes it so that when a failed machine is found during the `findScalableResourceProviderIDs` it will always gain a normalized provider ID with failure guard prepended. This is to ensure that machines which have gained a provider ID from the infrastructure and then later go into a failed state can be properly removed by the autoscaler when it wants to correct the size of a node group.	2025-03-19 12:34:29 -04:00
elmiko	003e6cd67c	make DecreaseTargetSize more accurate for clusterapi this change ensures that when DecreaseTargetSize is counting the nodes that it does not include any instances which are considered to be pending (i.e. not having a node ref), deleting, or are failed. this change will allow the core autoscaler to then decrease the size of the node group accordingly, instead of raising an error. This change also add some code to the unit tests to make detection of this condition easier.	2025-03-17 19:34:07 -04:00
Michael Weibel	98f948969a	fix(clusterapi): HasInstance/FindMachine with namespace prefix	2024-07-01 13:37:56 +02:00
enxebre	8cfe11c80d	Add benchmark for capi nodeGroup	2024-05-09 16:42:06 +02:00
enxebre	31fdc397fd	Avoid expesive pointer copy in capi nodegroup	2024-05-07 15:47:23 +02:00
Max Fedotov	6c65baa1c6	[clusterapi] Update tests for nodegroups with minSize=maxSize	2024-03-13 18:38:12 +01:00
Matt Boersma	17d2bd968e	[clusterapi] Add support for MachinePools squash	2023-04-12 09:22:49 -06:00
Matt Boersma	6040d29252	[cluster-api] Handle ignored errors	2023-02-28 13:56:29 -07:00
Cameron McAvoy	5713408a4c	clusterapi: track upcoming unprovisioned machines with a temporary providerID to enable detection of exhausted nodegroups	2023-01-12 15:16:32 -06:00
Michael McCune	5c9cc27f75	cleanup unused constants in clusterapi provider this change removes some unused values and adjusts the names in the unit tests to better reflect usage.	2022-09-29 14:22:05 -04:00
Eng Zer Jun	66805969de	test: use `T.Setenv` to set env vars in tests This commit replaces `os.Setenv` with `t.Setenv` in tests. The environment variable is automatically restored to its original value when the test and all its subtests complete. Reference: https://pkg.go.dev/testing#T.Setenv Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2022-08-18 21:28:18 +08:00
Michael McCune	f02c9972eb	add more caching to clusterapi provider this change adds logic to create informers for the infrastructure machine templates that are discovered during the scale from zero checks. it also adds tests and a slight change to the controller structure to account for the dynamic informer creation.	2022-08-17 16:25:16 -04:00
Michael McCune	1a65fde540	cleanup clusterapi scale from zero implementation This commit is a combination of several commits. Significant details are preserved below. * update functions for resource annotations This change converts some of the functions that look at annotation for resource usage to indicate their usage in the function name. This helps to make room for allowing the infrastructure reference as an alternate source for the capacity information. * migrate capacity logic into a single function This change moves the logic to collect the instance capacity from the TemplateNodeInfo function into a method of the unstructuredScalableResource named InstanceCapacity. This new function is created to house the logic that will decide between annotations and the infrastructure reference when calculating the capacity for the node. * add ability to lookup infrastructure references This change supplements the annotation lookups by adding the logic to read the infrastructure reference if it exists. This is done to determine if the machine template exposes a capacity field in its status. For more information on how this mechanism works, please see the cluster-api enhancement[0]. * add documentation for capi scaling from zero * improve tests for clusterapi scale from zero this change adds functionality to test the dynamic client behavior of getting the infrastructure machine templates. * update README with information about rbac changes this adds more information about the rbac changes necessary for the scale from zero support to work. * remove extra check for scaling from zero since the CanScaleFromZero function checks to see if both CPU and memory are present, there is no need to check a second time. This also adds some documentation to the CanScaleFromZero function to make it clearer what is happening. * update unit test for capi scale from zero adding a few more cases and details to the scale from zero unit tests, including ensuring that the int based annotations do not accept other unit types. [0] https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20210310-opt-in-autoscaling-from-zero.md	2022-07-22 20:21:32 -04:00
Michael McCune	1d5e0f155a	add user configurable cluster api version This change introduces an environment variable, `CAPI_VERSION`, through which a user can set the API version for the group they are using. This change is being added to address situations where a user might have multiple API versions for the cluster api group and wishes to be explicit about which version is selected. Also adds unit tests and documentation for the new behavior. This change does not break the existing behavior.	2022-02-25 09:46:34 -05:00
Clinton Yeboah	ecfaa6d700	removes deprecated CAPI annotations	2021-11-11 18:56:53 -05:00
Bartłomiej Wróblewski	4550bfe300	Register resources for fake dynamic client in tests	2020-11-30 10:50:27 +00:00
Michael McCune	d8d064f6bc	refactor CAPI controller unit test to use PollImmediate This change removes the `PollImmediateInfinite` calls in the cluster-api controller unit tests in favor of `PollImmediate`. It is being proposed to prevent an edge case where the polling calls would become blocked indefinitely. As we are using fake clients within the unit tests there should be no delay getting a return value, but just in case there is a miss on the poll function the new `PollImmediate` will timeout after 15 seconds.	2020-11-04 16:24:17 -05:00
Jason DeTiberus	06e5f6a0ed	Update group identifier to use for Cluster API annotations - Also add backwards compatibility for the previously used deprecated annotations	2020-09-21 10:42:46 -04:00
Jason DeTiberus	75b850718f	Add node autodiscovery to cluster-autoscaler clusterapi provider	2020-08-20 16:08:49 -04:00
Jason DeTiberus	63f9e40d82	Improve Cluster API tests to work better with constrained resources	2020-08-19 13:31:32 -04:00
Jason DeTiberus	18d44fc532	Convert clusterapi provider to use unstructured Remove internal types for Cluster API and replace with unstructured access	2020-07-21 15:49:03 -04:00
Joel Speed	5e0126ada5	Do not normalize Node IDs outside of CAPI provider	2020-04-16 10:32:27 +01:00
Joel Speed	d23d3a1dd5	Add testing for fake provider IDs	2020-04-02 15:24:57 +01:00
Joel Speed	8283e80da7	Provide fake proivder IDs for failed machines	2020-04-02 15:24:15 +01:00
Enxebre	1a16bbf4a9	Let the controller move on if machineDeployments are not available There might be adhoc environments where machineDeployments might not necessarily be available. This let the controller to remain functional for such scenarios.	2020-03-20 15:44:54 +01:00
Michael McCune	7082cfee81	Add the ability to override CAPI group via env variable and discover API version. This change adds detection for an environment variable to specify the group for the clusterapi resources. If the environment variable `CAPI_GROUP` is specified, then it will be used instead of the default. This also decouples the API group from the version and let the latter to be discovered dynamically.	2020-03-16 14:58:54 +01:00
Andrew McDermott	d9e3197daa	Normalize providerID values We index on providerID but it turns out that those values on node and machine are not always consistent. Some encode region, some do not, for example. This commit normalizes all values through the normalizedProviderString(). To ensure that we catch all places I've introduced a new type and made the find() functions take this new type in lieu of a string. Unit tests have also been adjusted to introduce a 'test:///' prefix on the providerID value to further validate the change. This change allows CAPI to work out-of-the-box, assuming v1alpha2. It's also reasonable to assert that this consistency should be enforced elsewhere and to make this behaviour easily revertable I'm leaving this as a separate commit in this patch series.	2020-03-10 10:59:05 +00:00
Andrew McDermott	46bb9b4f29	cloudprovider/clusterapi: new provider This adds a new cloudprovider based on the cluster-api project: https://github.com/kubernetes-sigs/cluster-api	2020-03-10 10:59:04 +00:00

36 Commits