`session.New` is deprecated and requires the `AWS_SDK_LOAD_CONFIG`
environment variable to be set in order to automatically call
`AssumeRoleWithWebIdentity` when `AWS_WEB_IDENTITY_TOKEN_FILE` is set
(which is not documented and most likely unintended).
Ensures that when MixedInstancePolicy is used in an AWS AutoScalingGroup, that
the buildInstanceType() AWS manager method returns an instance type after looking
at the MixedInstancePolicy.LaunchTemplateSpecification. The buildInstanceType()
method is called in numerous places including on cluster scale up actions.
Also adds documentation highlighting the minimum version of cluster autoscaler
supporting MixedInstancePolicy is 1.14
Replicated changes from kubernetes "Add AWS Custom Endpoint capability #70588" into cluster-autoscaler:
- Modified aws_manager snd aws_manager_test similar to kubernetes aws and aws_test.
Fix error format strings according to best practices from CodeReviewComments
Fix error format strings according to best practices from CodeReviewComments
Reverted incorrect change to with error format string
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingoBot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <bot@codelingo.io>
Resolve conflict
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingoBot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <bot@codelingo.io>
Fix error strings in testscases to remedy failing tests
Signed-off-by: CodeLingo Bot <bot@codelingo.io>
Fix more error strings to remedy failing tests
Signed-off-by: CodeLingo Bot <bot@codelingo.io>
This patch sets and then resets the `AWS_REGION` environment variable in
tests that call `createAWSManagerInternal`, which calls `getRegion`.
`getRegion` introduced `AWS_REGION` lookup at runtime, but tests that
don't (a) run inside AWS or (b) set `AWS_REGION` environment variable
encountered AWS SDK's EC2 Metadata lookup timeout. Tests that indirectly
invoked `getRegion`, e.g., via `createAWSManagerInternal`, saw at least
a 5-second timeout per invocation, ballooning test times.
Tackles kubernetes/autoscaler#1208. EC2 Metadata helpfully supplies AWS
Region info, so rather than _requiring_ an environment variable, this
patch enables `aws_manager.go` to retrieve that information from EC2
itself.
Example YAMLs no longer reference the vestigial `AWS_REGION` environment
variable, but supplying it to `cluster-autoscaler` still relays it to
the AWS SDK like before.
* Validate taint key with regex
Before attempting to perform a split on the tag value, validate it
matches the format <tag>:NoSchedule
This should prevent invalid tag keys from causing the autoscaler to
panic and crash
* Further taint validation
Also add test cases for valid taint methods
Instead of doing auto-discovery, nodes per ASG and target size per ASG separately, fetch all
of those ASG details in a single request on each Refresh() with maximum 10 seconds granularity.
Node group discovery is now handled by cloudprovider.Refresh() in all cases.
Additionally, explicit node groups can now be used alongside autodiscovery.
By caching AWS refs for nodes/EC2 instances already known to be not in any of ASGs managed by cluster-autoscaler(CA).
Please beware of the edge case - this method is safe as long as users don't attach nodes by calling AttachInstances API after CA cached them. I believe, even if it was necessary, a warning in the documentation about the edge case is enough for now. If we really need to support the case, I will submit an another PR to invalidate the cache periodically so that CA can detect the formerly cached nodes are attached to ASG(s).
Also refactor AwsManager for less complexity by extracting types, accordingly to the discussion made [here](https://github.com/kubernetes/autoscaler/pull/46#discussion_r117912687)
This is an alternative implementation of https://github.com/kubernetes/contrib/pull/1982
Notable differences from the original PR are:
* A new flag named `--node-group-auto-discovery` is introduced for opting in to enable the auto-discovery feature.
* For example, specifying `--cloud-provider aws --node-group-auto-discovery asg:tag=k8s.io/cluster-autoscaler/enabled` instructs CA to auto-discover ASGs tagged with `k8s.io/cluster-autoscaler/enabled` to be used as target node groups
* The new code path introduced by this PR is executed only when `node-group-auto-discovery` is specified. There is relatively less chance to break existing features by introducing this change
Resolves https://github.com/kubernetes/contrib/issues/1956
---
Other notes:
* We rely mainly on the `DescribeTags` API rather than `DescribeAutoScalingGroups` so that AWS can filter out unnecessary ASGs which doesn't belong to the k8s cluster, for us.
* If we relied on `DescribeAutoScalingGroups` here, as it doesn't support `Filter`ing, we'd need to iterate over ALL the ASGs available in an AWS account, which isn't desirable due to unnecessary excessive API calls and network usages
* Update cloudprovider/aws/README for the new configuration
* Warn abount invalid combination of flags
according to the review comment https://github.com/kubernetes/autoscaler/pull/11#discussion_r113713138
* Emit a validation error when both --nodes and --node-group-auto-discovery are specified
according to the review comment https://github.com/kubernetes/autoscaler/pull/11#discussion_r113958080
TODO/Possible future improvements before recommending this to everyone:
* Cache the result of an auto-discovery for a configurable period, so that we won't invoke DescribeTags and DescribeAutoScalingGroup APIs too many times