6.4 KiB
		
	
	
	
	
	
			
		
		
	
	
			6.4 KiB
		
	
	
	
	
	
Cluster Autoscaler on AliCloud
The cluster autoscaler on AliCloud scales worker nodes within any specified autoscaling group. It will run as a Deployment in your cluster. This README will go over some of the necessary steps required to get the cluster autoscaler up and running.
Kubernetes Version
Cluster autoscaler must run on v1.9.3 or greater.
Instance Type Support
- Standard Instancex86-Architecture,suitable for common scenes such as websites or api services.
 - GPU/FPGA InstanceHeterogeneous Computing,suitable for high performance computing.
 - Bare Metal InstanceBoth the elasticity of a virtual server and the high-performance and comprehensive features of a physical server.
 - Spot InstanceSpot instance are on-demand instances. They are designed to reduce your ECS costs in some cases.
 
ACS Console Deployment
doc: https://www.alibabacloud.com/help/en/container-service-for-kubernetes/latest/auto-scaling-of-nodes
Custom Deployment
1.Prepare Identity authentication
Use access-key-id and access-key-secret
apiVersion: v1
kind: Secret
metadata:
  name: cloud-config
  namespace: kube-system
data:
  # insert your base64 encoded Alicloud access id and key here, ensure there's no trailing newline:
  # such as:  echo -n "your_access_key_id" | base64
  access-key-id: "<BASE64_ACCESS_KEY_ID>"
  access-key-secret: "<BASE64_ACCESS_KEY_SECRET>"
  region-id: "<BASE64_REGION_ID>"
Use STS with RAM Role
{
  "Version": "1",
  "Statement": [
    {
      "Action": [
        "ess:Describe*",
        "ess:CreateScalingRule",
        "ess:ModifyScalingGroup",
        "ess:RemoveInstances",
        "ess:ExecuteScalingRule",
        "ess:ModifyScalingRule",
        "ess:DeleteScalingRule",
        "ess:DetachInstances",
        "ecs:DescribeInstanceTypes"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    }
  ]
}
2.ASG Setup
- create a Scaling Group in ESS(https://essnew.console.aliyun.com) with valid configurations.
 - create a Scaling Configuration for this Scaling Group with valid instanceType and User Data.In User Data,you can specific the script to initialize the environment and join this node to kubernetes cluster.If your Kubernetes cluster is hosted by ACS.you can use the attach script like this.
 
#!/bin/sh
# The token is generated by ACS console. https://www.alibabacloud.com/help/doc-detail/64983.htm?spm=a2c63.l28256.b99.33.46395ad54ozJFq
curl http://aliacs-k8s-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/public/pkg/run/attach/[kubernetes_cluster_version]/attach_node.sh | bash -s -- --openapi-token [token] --ess true 
3.cluster-autoscaler deployment
Use access-key-id and access-key-secret
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      priorityClassName: system-cluster-critical
      serviceAccountName: admin
      containers:
        - image: registry.cn-hangzhou.aliyuncs.com/acs/autoscaler:v1.3.1.2
          name: cluster-autoscaler
          resources:
            limits:
              cpu: 100m
              memory: 300Mi
            requests:
              cpu: 100m
              memory: 300Mi
          command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=alicloud
            - --nodes=[min]:[max]:[ASG_ID]
          imagePullPolicy: "Always"
          env:
          - name: ACCESS_KEY_ID
            valueFrom:
              secretKeyRef:
                name: cloud-config
                key: access-key-id
          - name: ACCESS_KEY_SECRET
            valueFrom:
              secretKeyRef:
                name: cloud-config
                key: access-key-secret
          - name: REGION_ID
            valueFrom:
              secretKeyRef:
                name: cloud-config
                key: region-id
          volumeMounts:
            - name: ssl-certs
              mountPath: /etc/ssl/certs/ca-certificates.crt
              readOnly: true
          imagePullPolicy: "Always"
      volumes:
        - name: ssl-certs
          hostPath:
            path: "/etc/ssl/certs/ca-certificates.crt"
Use STS with RAM Role
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    app: cluster-autoscaler
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      priorityClassName: system-cluster-critical
      serviceAccountName: admin
      containers:
        - image: registry.cn-hangzhou.aliyuncs.com/acs/autoscaler:v1.3.1.2
          name: cluster-autoscaler
          resources:
            limits:
              cpu: 100m
              memory: 300Mi
            requests:
              cpu: 100m
              memory: 300Mi
          command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=alicloud
            - --nodes=[min]:[max]:[ASG_ID]
          imagePullPolicy: "Always"
Auto-Discovery Setup
Auto Discovery is not supported in AliCloud currently.
Common Notes and Gotchas:
- The 
/etc/ssl/certs/ca-certificates.crtshould exist by default on your ecs instance. - By default, cluster autoscaler will not terminate nodes running pods in the kube-system namespace. You can override this default behaviour by passing in the 
--skip-nodes-with-system-pods=falseflag. - By default, cluster autoscaler will wait 10 minutes between scale down operations, you can adjust this using the 
--scale-down-delayflag. E.g.--scale-down-delay=5mto decrease the scale down delay to 5 minutes. - If you're running multiple ASGs, the 
--expanderflag supports three options:random,most-podsandleast-waste.randomwill expand a random ASG on scale up.most-podswill scale up the ASG that will schedule the most amount of pods.least-wastewill expand the ASG that will waste the least amount of CPU/MEM resources. In the event of a tie, cluster-autoscaler will fall back torandom. - If you're managing your own kubelets, they need to be started with the 
--provider-idflag.