This change adds a Managed Nodegroup cache that will hold labels and taints from the AWS EKS DescribeNodegroup API output. It will be used to get more information for EKS managed nodegroups that are scaled to 0 nodes. Currently this code will only run when the managed nodegroup has 0 nodes and CAS doesn't have a node info object cached already.
Not included in this PR, but information for the future:
To make this used whenever the nodegroup is scaled to 0 nodes we'd have to make a change in the general CAS code [around here](10451c2032/cluster-autoscaler/processors/nodeinfosprovider/mixed_nodeinfos_processor.go (L114))
This general code change would be related to discussion in this old PR about node cache info: https://github.com/kubernetes/autoscaler/pull/4258
This change updates the conditional to check for the cluster-name label as well since we need both for the DescribeNodegroup API call and a customer can accidentally delete either.
This change is the first change for the AWS EKS Managed Nodegroups support for scale-to-0 changes in the cluster autoscaler. It checks for the AWS EKS specific tags that we automaticaly add for Managed Nodegroups.
Variable name update in test Co-authored-by: Guy Templeton <guyjtempleton@googlemail.com>
And at the same time only set stable labels in all buildGenericLabels
implementations.
This fixes issues when a node group has 0 nodes yet and node labels are
built using buildGenericLabels and the node-template labels.
Issues include (anti-)affinity and nodeSelectors for the given labels,
giving false-negative results for candidate nodes, which leads to ASGs
never scaling up.
While this was previously effectively limited to 50, `DescribeAutoScalingGroups` now supports
fetching 100 ASG per calls on all regions, matching what's documented:
https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_DescribeAutoScalingGroups.html
```
AutoScalingGroupNames.member.N
The names of the Auto Scaling groups.
By default, you can only specify up to 50 names.
You can optionally increase this limit using the MaxRecords parameter.
MaxRecords
The maximum number of items to return with this call.
The default value is 50 and the maximum value is 100.
```
Doubling this halves API calls on large clusters, which should help to prevent throttling.
Sets the `kubernetes.io/arch` (and legacy `beta.kubernetes.io/arch`)
to the proper instance architecture.
While at it, re-gen the instance types list (adding new instance types
that were missing)
Force refreshing everything at every DeleteNodes calls causes slow down
and throttling on large clusters with many ASGs (and lot of activity).
That function might be called several times in a row during scale-down
(once for each ASG having a node to be removed). Each time the forced
refresh will re-discover all ASGs, all LaunchConfigurations, then re-list all
instances from discovered ASGs.
That immediate refresh isn't required anyway, as the cache's DeleteInstances
concrete implementation will decrement the nodegroup size, and we can
schedule a grouped refresh for the next loop iteration.
`session.New` is deprecated and requires the `AWS_SDK_LOAD_CONFIG`
environment variable to be set in order to automatically call
`AssumeRoleWithWebIdentity` when `AWS_WEB_IDENTITY_TOKEN_FILE` is set
(which is not documented and most likely unintended).
Ensures that when MixedInstancePolicy is used in an AWS AutoScalingGroup, that
the buildInstanceType() AWS manager method returns an instance type after looking
at the MixedInstancePolicy.LaunchTemplateSpecification. The buildInstanceType()
method is called in numerous places including on cluster scale up actions.
Also adds documentation highlighting the minimum version of cluster autoscaler
supporting MixedInstancePolicy is 1.14