autoscaler

Commit Graph

Author	SHA1	Message	Date
Thalia Wang	70f79316f6	Add VMs implementation to master (#8078 )	2025-06-03 14:40:39 -07:00
Rachel Gregory	cecb34cb86	Dynamic listing of SKUs (master) (#7239 ) * Dynamic listing of SKUs (master) * re-add enableDynamicInstanceList * update README.md	2024-09-06 19:13:07 +01:00
Robin Deeboonchai	4e124296f2	chore: defork cloud-provider-azure	2024-08-19 10:31:21 -07:00
Bryce Soghigian	e48a743079	refactor: refactoring tests and sharing validation between ngfornode + hasinstance	2024-07-31 09:27:40 -07:00
Robin Deeboonchai	2eb6cbeb1f	refactor: upstream most of Azure managed CAS changes in cloudprovider/azure	2024-07-21 11:44:32 -07:00
Kubernetes Prow Robot	3c6dd26d9e	Merge pull request #6863 from rrangith/azure-default-sizes Default min/max sizes for Azure VMSSs	2024-07-01 07:29:35 -07:00
Rahul Rangith	333d438dbf	Default min/max sizes for Azure VMSSs return a struct	2024-06-11 13:55:38 -04:00
wenxuanW	ba6977e7e6	[Azure VMs Pool] Support mixed agentpool types in Azure Cache	2024-05-14 10:03:29 -07:00
prachigandhi	0d5a71e867	review comments - simplify retry logic	2024-03-28 08:52:50 -07:00
Miranda Craghead	6eaa0c3e75	Merged PR 1379: added retry for creatingAzureManager in case of throttled requests added retry for forceRefresh in case of throttled requests ran tests MallocNanoZone=0 go test -race k8s.io/autoscaler/cluster-autoscaler/cloudprovider/azure -- passed and commented out unit test -- commented out as it takes 10 minutes to complete func TestCreateAzureManagerWithRetryError(t *testing.T) { ctrl := gomock.NewController(t) defer ctrl.Finish() mockVMClient := mockvmclient.NewMockInterface(ctrl) mockVMSSClient := mockvmssclient.NewMockInterface(ctrl) mockVMSSClient.EXPECT().List(gomock.Any(), "fakeId").Return([]compute.VirtualMachineScaleSet{}, retry.NewError(true, errors.New("test"))).AnyTimes() mockAzClient := &azClient{ virtualMachinesClient: mockVMClient, virtualMachineScaleSetsClient: mockVMSSClient, } manager, err := createAzureManagerInternal(strings.NewReader(validAzureCfg), cloudprovider.NodeGroupDiscoveryOptions{}, config.AutoscalingOptions{}, mockAzClient) assert.Nil(t, manager) assert.NotNil(t, err) }	2024-02-20 15:25:01 -08:00
Jack Francis	9e526aed3e	Azure: Remove AKS vmType Signed-off-by: Jack Francis <jackfrancis@gmail.com>	2023-11-27 07:17:06 -08:00
Benjamin Pineau	20c451bbc0	Azure: effectively cache instance-types SKUs The skewer's library cache is re-created at every call, which causes pressure on Azure API, and slows down the cluster-autoscaler startup time by two minutes on my small (120 nodes, 300 VMSS) test cluster. This was hitting the API twice on cache miss to look for non-promo instance types (even when the instance name doesn't ends with "_Promo").	2022-07-25 16:09:36 +02:00
Benjamin Pineau	28cd49c09e	implement GetOptions for Azure Support per-VMSS (scaledown) settings as permited by the cloudprovider's interface `GetOptions()` method.	2021-08-24 09:48:51 +02:00
Marwan Ahmed	8d365c2a9c	add missing call to fetch autodiscovered nodegroups	2021-03-23 21:01:23 -07:00
Cecile Robert-Michon	28badba175	cleanup: refactor Azure cache and remove redundant API calls	2020-12-07 11:55:34 -07:00
Marwan Ahmed	5ee545501e	gofmt and update header	2020-08-25 20:59:57 -07:00
Marwan Ahmed	ee176fb4bf	encapsulate autodiscovery utils in a separate file and remove autodiscovery for agentpool mode	2020-08-25 19:45:08 -07:00
Marwan Ahmed	5e68c1274a	refactor azure config into its own file	2020-08-25 18:11:12 -07:00
Benjamin Pineau	c168eed930	Azure: optional jitter on initial VMSS VM cache refresh On (re)start, cluster-autoscaler will refresh all VMSS instances caches at once, and set those cache TTL to 5mn. All VMSS VM List calls (for VMSS discovered at boot) will then continuously hit ARM API at the same time, potentially causing regular throttling bursts. Exposing an optional jitter subtracted from the initial first scheduled refresh delay will splay those calls (except for the first one, at start), while keeping the predictable (max. 5mn, unless the VMSS changed) refresh interval after the first refresh.	2020-08-19 20:48:28 +02:00
Benjamin Pineau	4997972426	Avoid unwanted VMSS VMs caches invalidations `fetchAutoAsgs()` is called at regular intervals, fetches a list of VMSS, then call `Register()` to cache each of those. That registration function will tell the caller wether that vmss' cache is outdated (when the provided VMSS, supposedly fresh, is different than the one held in cache) and will replace existing cache entry by the provided VMSS (which in effect will require a forced refresh since that ScaleSet struct is passed by fetchAutoAsgs with a nil lastRefresh time and an empty instanceCache). To detect changes, `Register()` uses an `reflect.DeepEqual()` between the provided and the cached VMSS. Which does always find them different: cached VMSS were enriched with instances lists (while the provided one is blank, fresh from a simple vmss.list call). That DeepEqual is also fragile due to the compared structs containing mutexes (that may be held or not) and refresh timestamps, attributes that shoudln't be relevant to the comparison. As a consequence, all Register() calls causes indirect cache invalidations and a costly refresh (VMSS VMS List). The number of Register() calls is directly proportional to the number of VMSS attached to the cluster, and can easily triggers ARM API throttling. With a large number of VMSS, that throttling prevents `fetchAutoAsgs` to ever succeed (and cluster-autoscaler to start). ie.: ``` I0807 16:55:25.875907 153 azure_scale_set.go:344] GetScaleSetVms: starts I0807 16:55:25.875915 153 azure_scale_set.go:350] GetScaleSetVms: scaleSet.Name: a-testvmss-10, vmList: [] E0807 16:55:25.875919 153 azure_scale_set.go:352] VirtualMachineScaleSetVMsClient.List failed for a-testvmss-10: &{true 0 2020-08-07 17:10:25.875447854 +0000 UTC m=+913.985215807 azure cloud provider throttled for operation VMSSVMList with reason "client throttled"} E0807 16:55:25.875928 153 azure_manager.go:538] Failed to regenerate ASG cache: Retriable: true, RetryAfter: 899s, HTTPStatusCode: 0, RawError: azure cloud provider throttled for operation VMSSVMList with reason "client throttled" F0807 16:55:25.875934 153 azure_cloud_provider.go:167] Failed to create Azure Manager: Retriable: true, RetryAfter: 899s, HTTPStatusCode: 0, RawError: azure cloud provider throttled for operation VMSSVMList with reason "client throttled" goroutine 28 [running]: ``` From [`ScaleSet` struct attributes](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/azure_scale_set.go#L74-L89) (manager, sizes, mutexes, refreshes timestamps) only sizes are relevant to that comparison. `curSize` is not strictly necessary, but comparing it will provide early instance caches refreshs.	2020-08-18 14:52:02 +02:00
qini	ec7925e868	Add unit tests for azure aks node pool	2020-08-09 21:25:51 +08:00
Julien Balestra	c7483d914b	cluster-autoscaler: ignore nodegroups with min/max tag issues Signed-off-by: Julien Balestra <julien.balestra@datadoghq.com>	2020-07-28 11:15:23 +02:00
Maciek Pytel	655b4081f4	Migrate to klog v2	2020-06-05 17:22:26 +02:00
qini	7b494d7daa	Replace fake storages with mock clients in unit tests.	2020-06-05 09:43:19 +08:00
Marc Sensenich	1d6f18ff3c	fixup! Set Azure Rate Limit Defaults from Env Signed-off-by: Marc Sensenich <sensenichm91@gmail.com>	2020-05-09 17:35:47 -04:00
Marc Sensenich	46850966d9	Set Azure Rate Limit Defaults from Env Signed-off-by: Marc Sensenich <sensenichm91@gmail.com>	2020-05-04 22:34:47 -04:00
Ivan Tsai	2d20fc35af	remove the restriction of max size greater than 1	2020-04-17 16:57:36 +08:00
t-qini	a091626a1c	Update vmss client.	2020-03-08 21:25:12 +08:00
Marwan Ahmed	b3f2512a68	remove unused asg cache TTL	2020-02-18 21:17:25 -08:00
Marwan Ahmed	36d869cdb0	make vmss metadata cache TTL configurable	2020-02-18 19:20:55 -08:00
Kubernetes Prow Robot	76bc551af0	Merge pull request #2732 from marwanad/reduce-vmss-calls Decrease the number of calls to VMSS API	2020-01-14 23:03:32 -08:00
Marwan Ahmed	d88a8e5120	Decrease the number of calls to VMSS API	2020-01-14 10:04:10 -08:00
Kubernetes Prow Robot	a99e248f09	Merge pull request #2722 from nilo19/qi-remove-acs-support Provider/Azure: Remove ACS support.	2020-01-09 00:33:44 -08:00
t-qini	1948f0d352	Remove ACS support.	2020-01-09 15:53:55 +08:00
Archangel_SDY	e09dac1a65	Support Azure user assigned identity	2020-01-09 00:15:52 +08:00
Kubernetes Prow Robot	9831fc0404	Merge pull request #2714 from marwanad/fix-az-autodiscovery Fix azure autodiscovery when max tags aren't specified	2020-01-07 00:40:18 -08:00
Kubernetes Prow Robot	f7843fb5d1	Merge pull request #2692 from nilo19/qi-generate-instance Generate azure instance types.	2020-01-07 00:38:19 -08:00
Marwan Ahmed	a03458897a	dont attempt to parse min and max counts through discoverer	2020-01-06 20:05:54 -08:00
t-qini	5765f4b5d2	Generate azure instance types.	2019-12-28 11:18:40 +08:00
t-qini	48cea0f98d	Delete outdated deployments.	2019-12-28 10:18:25 +08:00
Marwan Ahmed	eebb33e325	fix: properly parse azure cloud provider config	2019-12-18 16:51:28 -08:00
Kubernetes Prow Robot	d18a2a8bd4	Merge pull request #2613 from nilo19/qi-mitigate-issue-2594 Fix Azure VMSS scale down issues when updating target sizes.	2019-12-17 17:29:57 -08:00
t-qini	51e47836c6	Clear the content of ss.DecreaseTargetSize and shorten the regeneration time of asg cache.	2019-12-17 09:08:34 +08:00
Jose Armesto	c38a151ca2	Use default caching for Instance Metadata	2019-11-26 13:29:08 +01:00
Jose Armesto	23322d1892	Sort imports	2019-11-25 10:53:07 +01:00
Jose Armesto	f26e3d51b0	Read subscription ID from instance metadata	2019-11-19 16:36:58 +01:00
Ace Eldeib	88d93a9f16	feat: azure vmss min/mix autodiscovery	2019-10-17 04:54:03 -07:00
Łukasz Osipiuk	016bf7fc2c	Use k8s.io/klog instead github.com/golang/glog	2018-11-26 17:30:31 +01:00
Pengfei Ni	7bf7863829	Update Azure clients to compatible with Kubernetes v1.11	2018-06-11 13:54:12 +08:00
Marcin Wielgus	f0907eb5ad	Merge pull request #813 from kkmsft/azure-containerservice Autoscaler for Azure Container Service (AKS and ACS)	2018-05-08 01:08:35 +02:00

1 2

69 Commits