Commit Graph

69 Commits

Author SHA1 Message Date
Thalia Wang 70f79316f6
Add VMs implementation to master (#8078) 2025-06-03 14:40:39 -07:00
Rachel Gregory cecb34cb86
Dynamic listing of SKUs (master) (#7239)
* Dynamic listing of SKUs (master)

* re-add enableDynamicInstanceList

* update README.md
2024-09-06 19:13:07 +01:00
Robin Deeboonchai 4e124296f2 chore: defork cloud-provider-azure 2024-08-19 10:31:21 -07:00
Bryce Soghigian e48a743079 refactor: refactoring tests and sharing validation between ngfornode + hasinstance 2024-07-31 09:27:40 -07:00
Robin Deeboonchai 2eb6cbeb1f refactor: upstream most of Azure managed CAS changes in cloudprovider/azure 2024-07-21 11:44:32 -07:00
Kubernetes Prow Robot 3c6dd26d9e
Merge pull request #6863 from rrangith/azure-default-sizes
Default min/max sizes for Azure VMSSs
2024-07-01 07:29:35 -07:00
Rahul Rangith 333d438dbf
Default min/max sizes for Azure VMSSs
return a struct
2024-06-11 13:55:38 -04:00
wenxuanW ba6977e7e6 [Azure VMs Pool] Support mixed agentpool types in Azure Cache 2024-05-14 10:03:29 -07:00
prachigandhi 0d5a71e867 review comments - simplify retry logic 2024-03-28 08:52:50 -07:00
Miranda Craghead 6eaa0c3e75 Merged PR 1379: added retry for creatingAzureManager in case of throttled requests
added retry for forceRefresh in case of throttled requests
ran tests
MallocNanoZone=0 go test -race k8s.io/autoscaler/cluster-autoscaler/cloudprovider/azure -- passed

and commented out unit test -- commented out as it takes 10 minutes to complete

func TestCreateAzureManagerWithRetryError(t *testing.T) {
	ctrl := gomock.NewController(t)
	defer ctrl.Finish()
	mockVMClient := mockvmclient.NewMockInterface(ctrl)
	mockVMSSClient := mockvmssclient.NewMockInterface(ctrl)
	mockVMSSClient.EXPECT().List(gomock.Any(), "fakeId").Return([]compute.VirtualMachineScaleSet{}, retry.NewError(true, errors.New("test"))).AnyTimes()
	mockAzClient := &azClient{
		virtualMachinesClient:         mockVMClient,
		virtualMachineScaleSetsClient: mockVMSSClient,
	}
	manager, err := createAzureManagerInternal(strings.NewReader(validAzureCfg), cloudprovider.NodeGroupDiscoveryOptions{}, config.AutoscalingOptions{}, mockAzClient)
	assert.Nil(t, manager)
	assert.NotNil(t, err)
}
2024-02-20 15:25:01 -08:00
Jack Francis 9e526aed3e Azure: Remove AKS vmType
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
2023-11-27 07:17:06 -08:00
Benjamin Pineau 20c451bbc0 Azure: effectively cache instance-types SKUs
The skewer's library cache is re-created at every call, which causes
pressure on Azure API, and slows down the cluster-autoscaler startup
time by two minutes on my small (120 nodes, 300 VMSS) test cluster.

This was hitting the API twice on cache miss to look for non-promo
instance types (even when the instance name doesn't ends with "_Promo").
2022-07-25 16:09:36 +02:00
Benjamin Pineau 28cd49c09e implement GetOptions for Azure
Support per-VMSS (scaledown) settings as permited by the
cloudprovider's interface `GetOptions()` method.
2021-08-24 09:48:51 +02:00
Marwan Ahmed 8d365c2a9c add missing call to fetch autodiscovered nodegroups 2021-03-23 21:01:23 -07:00
Cecile Robert-Michon 28badba175
cleanup: refactor Azure cache and remove redundant API calls 2020-12-07 11:55:34 -07:00
Marwan Ahmed 5ee545501e gofmt and update header 2020-08-25 20:59:57 -07:00
Marwan Ahmed ee176fb4bf encapsulate autodiscovery utils in a separate file and remove autodiscovery for agentpool mode 2020-08-25 19:45:08 -07:00
Marwan Ahmed 5e68c1274a refactor azure config into its own file 2020-08-25 18:11:12 -07:00
Benjamin Pineau c168eed930 Azure: optional jitter on initial VMSS VM cache refresh
On (re)start, cluster-autoscaler will refresh all VMSS instances caches
at once, and set those cache TTL to 5mn. All VMSS VM List calls (for VMSS
discovered at boot) will then continuously hit ARM API at the same time,
potentially causing regular throttling bursts.

Exposing an optional jitter subtracted from the initial first scheduled
refresh delay will splay those calls (except for the first one, at start),
while keeping the predictable (max. 5mn, unless the VMSS changed) refresh
interval after the first refresh.
2020-08-19 20:48:28 +02:00
Benjamin Pineau 4997972426 Avoid unwanted VMSS VMs caches invalidations
`fetchAutoAsgs()` is called at regular intervals, fetches a list of VMSS,
then call `Register()` to cache each of those. That registration function
will tell the caller wether that vmss' cache is outdated (when the provided
VMSS, supposedly fresh, is different than the one held in cache) and will
replace existing cache entry by the provided VMSS (which in effect will
require a forced refresh since that ScaleSet struct is passed by
fetchAutoAsgs with a nil lastRefresh time and an empty instanceCache).

To detect changes, `Register()` uses an `reflect.DeepEqual()` between the
provided and the cached VMSS. Which does always find them different: cached
VMSS were enriched with instances lists (while the provided one is blank,
fresh from a simple vmss.list call). That DeepEqual is also fragile due to
the compared structs containing mutexes (that may be held or not) and
refresh timestamps, attributes that shoudln't be relevant to the comparison.

As a consequence, all Register() calls causes indirect cache invalidations
and a costly refresh (VMSS VMS List). The number of Register() calls is
directly proportional to the number of VMSS attached to the cluster, and
can easily triggers ARM API throttling.

With a large number of VMSS, that throttling prevents `fetchAutoAsgs` to
ever succeed (and cluster-autoscaler to start). ie.:

```
I0807 16:55:25.875907     153 azure_scale_set.go:344] GetScaleSetVms: starts
I0807 16:55:25.875915     153 azure_scale_set.go:350] GetScaleSetVms: scaleSet.Name: a-testvmss-10, vmList: []
E0807 16:55:25.875919     153 azure_scale_set.go:352] VirtualMachineScaleSetVMsClient.List failed for a-testvmss-10: &{true 0 2020-08-07 17:10:25.875447854 +0000 UTC m=+913.985215807 azure cloud provider throttled for operation VMSSVMList with reason "client throttled"}
E0807 16:55:25.875928     153 azure_manager.go:538] Failed to regenerate ASG cache: Retriable: true, RetryAfter: 899s, HTTPStatusCode: 0, RawError: azure cloud provider throttled for operation VMSSVMList with reason "client throttled"
F0807 16:55:25.875934     153 azure_cloud_provider.go:167] Failed to create Azure Manager: Retriable: true, RetryAfter: 899s, HTTPStatusCode: 0, RawError: azure cloud provider throttled for operation VMSSVMList with reason "client throttled"
goroutine 28 [running]:
```

From [`ScaleSet` struct attributes](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/azure_scale_set.go#L74-L89)
(manager, sizes, mutexes, refreshes timestamps) only sizes are relevant
to that comparison. `curSize` is not strictly necessary, but comparing it
will provide early instance caches refreshs.
2020-08-18 14:52:02 +02:00
qini ec7925e868 Add unit tests for azure aks node pool 2020-08-09 21:25:51 +08:00
Julien Balestra c7483d914b cluster-autoscaler: ignore nodegroups with min/max tag issues
Signed-off-by: Julien Balestra <julien.balestra@datadoghq.com>
2020-07-28 11:15:23 +02:00
Maciek Pytel 655b4081f4 Migrate to klog v2 2020-06-05 17:22:26 +02:00
qini 7b494d7daa Replace fake storages with mock clients in unit tests. 2020-06-05 09:43:19 +08:00
Marc Sensenich 1d6f18ff3c fixup! Set Azure Rate Limit Defaults from Env
Signed-off-by: Marc Sensenich <sensenichm91@gmail.com>
2020-05-09 17:35:47 -04:00
Marc Sensenich 46850966d9 Set Azure Rate Limit Defaults from Env
Signed-off-by: Marc Sensenich <sensenichm91@gmail.com>
2020-05-04 22:34:47 -04:00
Ivan Tsai 2d20fc35af
remove the restriction of max size greater than 1 2020-04-17 16:57:36 +08:00
t-qini a091626a1c Update vmss client. 2020-03-08 21:25:12 +08:00
Marwan Ahmed b3f2512a68 remove unused asg cache TTL 2020-02-18 21:17:25 -08:00
Marwan Ahmed 36d869cdb0 make vmss metadata cache TTL configurable 2020-02-18 19:20:55 -08:00
Kubernetes Prow Robot 76bc551af0
Merge pull request #2732 from marwanad/reduce-vmss-calls
Decrease the number of calls to VMSS API
2020-01-14 23:03:32 -08:00
Marwan Ahmed d88a8e5120 Decrease the number of calls to VMSS API 2020-01-14 10:04:10 -08:00
Kubernetes Prow Robot a99e248f09
Merge pull request #2722 from nilo19/qi-remove-acs-support
Provider/Azure: Remove ACS support.
2020-01-09 00:33:44 -08:00
t-qini 1948f0d352 Remove ACS support. 2020-01-09 15:53:55 +08:00
Archangel_SDY e09dac1a65 Support Azure user assigned identity 2020-01-09 00:15:52 +08:00
Kubernetes Prow Robot 9831fc0404
Merge pull request #2714 from marwanad/fix-az-autodiscovery
Fix azure autodiscovery when max tags aren't specified
2020-01-07 00:40:18 -08:00
Kubernetes Prow Robot f7843fb5d1
Merge pull request #2692 from nilo19/qi-generate-instance
Generate azure instance types.
2020-01-07 00:38:19 -08:00
Marwan Ahmed a03458897a dont attempt to parse min and max counts through discoverer 2020-01-06 20:05:54 -08:00
t-qini 5765f4b5d2 Generate azure instance types. 2019-12-28 11:18:40 +08:00
t-qini 48cea0f98d Delete outdated deployments. 2019-12-28 10:18:25 +08:00
Marwan Ahmed eebb33e325 fix: properly parse azure cloud provider config 2019-12-18 16:51:28 -08:00
Kubernetes Prow Robot d18a2a8bd4
Merge pull request #2613 from nilo19/qi-mitigate-issue-2594
Fix Azure VMSS scale down issues when updating target sizes.
2019-12-17 17:29:57 -08:00
t-qini 51e47836c6 Clear the content of ss.DecreaseTargetSize and shorten the regeneration time of asg cache. 2019-12-17 09:08:34 +08:00
Jose Armesto c38a151ca2
Use default caching for Instance Metadata 2019-11-26 13:29:08 +01:00
Jose Armesto 23322d1892
Sort imports 2019-11-25 10:53:07 +01:00
Jose Armesto f26e3d51b0
Read subscription ID from instance metadata 2019-11-19 16:36:58 +01:00
Ace Eldeib 88d93a9f16 feat: azure vmss min/mix autodiscovery 2019-10-17 04:54:03 -07:00
Łukasz Osipiuk 016bf7fc2c Use k8s.io/klog instead github.com/golang/glog 2018-11-26 17:30:31 +01:00
Pengfei Ni 7bf7863829 Update Azure clients to compatible with Kubernetes v1.11 2018-06-11 13:54:12 +08:00
Marcin Wielgus f0907eb5ad
Merge pull request #813 from kkmsft/azure-containerservice
Autoscaler for Azure Container Service (AKS and ACS)
2018-05-08 01:08:35 +02:00