# USING KOPS WITH PRIVATE NETWORKING AND A BASTION HOST IN A HIGHLY-AVAILABLE SETUP ## WHAT WE WANT TO ACCOMPLISH HERE? The exercise described in this document will focus on the following goals: - Demonstrate how to use a production-setup with 3 masters and two workers in different availability zones. - Demonstrate how to use a private networking setup with a bastion host. - Ensure our masters are deployed on 3 different AWS availability zones. - Ensure our nodes are deployed on 2 different AWS availability zones. - Add true high-availability to the bastion instance group. ## PRE-FLIGHT CHECK: Please follow our [basic-requirements document](basic-requirements.md) that is common for all our exercises. Ensure the basic requirements are covered before continuing. ## AWS/KOPS ENVIRONMENT SETUP: First, using some scripting and assuming you already configured your "aws" environment on your linux system, use the following commands in order to export your AWS access/secret (this will work if you are using the default profile): ```bash export AWS_ACCESS_KEY_ID=`grep aws_access_key_id ~/.aws/credentials|awk '{print $3}'` export AWS_SECRET_ACCESS_KEY=`grep aws_secret_access_key ~/.aws/credentials|awk '{print $3}'` echo "$AWS_ACCESS_KEY_ID $AWS_SECRET_ACCESS_KEY" ``` If you are using multiple profiles (and not the default one), you should use the following command instead in order to export your profile: ```bash export AWS_PROFILE=name_of_your_profile ``` Create a bucket (if you don't already have one) for your cluster state: ```bash aws s3api create-bucket --bucket my-kops-s3-bucket-for-cluster-state --region us-east-1 ``` Then export the name of your cluster along with the "S3" URL of your bucket: ```bash export NAME=privatekopscluster.k8s.local export KOPS_STATE_STORE=s3://my-kops-s3-bucket-for-cluster-state ``` Some things to note from here: - "NAME" will be an environment variable that we'll use from now in order to refer to our cluster name. For this practical exercise, our cluster name is "privatekopscluster.k8s.local". - Because we'll use gossip DNS instead of a valid DNS domain on AWS ROUTE53 service, our cluster name need to include the string **".k8s.local"** at the end (this is covered on our AWS tutorials). You can see more about this on our [Getting Started Doc.](../getting_started/aws.md) ## KOPS PRIVATE CLUSTER CREATION: Let's first create our cluster ensuring a multi-master setup with 3 masters in a multi-az setup, two worker nodes also in a multi-az setup, and using both private networking and a bastion server: ```bash kops create cluster \ --cloud=aws \ --master-zones=us-east-1a,us-east-1b,us-east-1c \ --zones=us-east-1a,us-east-1b,us-east-1c \ --node-count=2 \ --topology private \ --networking kopeio-vxlan \ --node-size=t3.micro \ --master-size=t3.micro \ ${NAME} ``` A few things to note here: - The environment variable ${NAME} was previously exported with our cluster name: privatekopscluster.k8s.local. - "--cloud=aws": As kOps grows and begin to support more clouds, we need to tell the command to use the specific cloud we want for our deployment. In this case: amazon web services (aws). - For true HA (high availability) at the master level, we need to pick a region with 3 availability zones. For this practical exercise, we are using "us-east-1" AWS region which contains 5 availability zones (az's for short): us-east-1a, us-east-1b, us-east-1c, us-east-1d and us-east-1e. We used "us-east-1a,us-east-1b,us-east-1c" for our masters. - The "--master-zones=us-east-1a,us-east-1b,us-east-1c" KOPS argument will actually enforce we want 3 masters here. "--node-count=2" only applies to the worker nodes (not the masters). Again, real "HA" on Kubernetes control plane requires 3 masters. - The "--topology private" argument will ensure that all our instances will have private IP's and no public IP's from amazon. - We are including the arguments "--node-size" and "master-size" to specify the "instance types" for both our masters and worker nodes. - Because we are just doing a simple LAB, we are using "t3.micro" machines. Please DON'T USE t3.micro on real production systems. Start with "t3.medium" as a minimum realistic/workable machine type. - And finally, the "--networking kopeio-vxlan" argument. With the private networking model, we need to tell kOps which networking subsystem to use. More information about kOps supported networking models can be obtained from the [KOPS Kubernetes Networking Documentation](../networking.md). For this exercise we'll use "kopeio-vxlan" (or "kopeio" for short). **NOTE**: You can add the "--bastion" argument here if you are not using "gossip dns" and create the bastion from start, but if you are using "gossip-dns" this will make this cluster to fail (this is a bug we are correcting now). For the moment don't use "--bastion" when using gossip DNS. We'll show you how to get around this by first creating the private cluster, then creation the bastion instance group once the cluster is running. With those points clarified, let's deploy our cluster: ```bash kops update cluster ${NAME} --yes ``` Go for a coffee or just take a 10~15 minutes walk. After that, the cluster will be up-and-running. We can check this with the following commands: ```bash kops validate cluster Using cluster from kubectl context: privatekopscluster.k8s.local Validating cluster privatekopscluster.k8s.local INSTANCE GROUPS NAME ROLE MACHINETYPE MIN MAX SUBNETS master-us-east-1a Master t3.micro 1 1 us-east-1a master-us-east-1b Master t3.micro 1 1 us-east-1b master-us-east-1c Master t3.micro 1 1 us-east-1c nodes Node t3.micro 2 2 us-east-1a,us-east-1b,us-east-1c NODE STATUS NAME ROLE READY ip-172-20-111-44.ec2.internal master True ip-172-20-44-102.ec2.internal node True ip-172-20-53-10.ec2.internal master True ip-172-20-64-151.ec2.internal node True ip-172-20-74-55.ec2.internal master True Your cluster privatekopscluster.k8s.local is ready ``` The ELB created by kOps will expose the Kubernetes API trough "https" (configured on our ~/.kube/config file): ```bash grep server ~/.kube/config server: https://api-privatekopscluster-k8-djl5jb-1946625559.us-east-1.elb.amazonaws.com ``` But, all the cluster instances (masters and worker nodes) will have private IP's only (no AWS public IP's). Then, in order to reach our instances, we need to add a "bastion host" to our cluster. ## ADDING A BASTION HOST TO OUR CLUSTER. We mentioned earlier that we can't add the "--bastion" argument to our "kops create cluster" command if we are using "gossip dns" (a fix it's on the way as we speaks). That forces us to add the bastion afterwards, once the cluster is up and running. Let's add a bastion here by using the following command: ```bash kops create instancegroup bastions --role Bastion --subnet utility-us-east-1a --name ${NAME} ``` **Explanation of this command:** - This command will add to our cluster definition a new instance group called "bastions" with the "Bastion" role on the aws subnet "utility-us-east-1a". Note that the "Bastion" role need the first letter to be a capital (Bastion=ok, bastion=not ok). - The subnet "utility-us-east-1a" was created when we created our cluster the first time. KOPS add the "utility-" prefix to all subnets created on all specified AZ's. In other words, if we instructed kOps to deploy our instances on us-east-1a, use-east-1b and use-east-1c, kOps will create the subnets "utility-us-east-1a", "utility-us-east-1b" and "utility-us-east-1c". Because we need to tell kOps where to deploy our bastion (or bastions), we need to specify the subnet. You'll see the following output in your editor when you can change your bastion group size and add more networks. ```yaml apiVersion: kops.k8s.io/v1alpha2 kind: InstanceGroup metadata: name: bastions spec: image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20200907 machineType: t3.micro maxSize: 1 minSize: 1 role: Bastion subnets: - utility-us-east-1a ``` If want a H.A. setup for your bastions, modify minSize and maxSize and add more subnets. We'll do this later on this exercise. Save this and deploy the changes: ```bash kops update cluster ${NAME} --yes ``` You will see an output like the following: ```bash I0828 13:06:33.153920 16528 apply_cluster.go:420] Gossip DNS: skipping DNS validation I0828 13:06:34.686722 16528 executor.go:91] Tasks: 0 done / 116 total; 40 can run I0828 13:06:36.181677 16528 executor.go:91] Tasks: 40 done / 116 total; 26 can run I0828 13:06:37.602302 16528 executor.go:91] Tasks: 66 done / 116 total; 34 can run I0828 13:06:39.116916 16528 launchconfiguration.go:327] waiting for IAM instance profile "bastions.privatekopscluster.k8s.local" to be ready I0828 13:06:49.761535 16528 executor.go:91] Tasks: 100 done / 116 total; 9 can run I0828 13:06:50.897272 16528 executor.go:91] Tasks: 109 done / 116 total; 7 can run I0828 13:06:51.516158 16528 executor.go:91] Tasks: 116 done / 116 total; 0 can run I0828 13:06:51.944576 16528 update_cluster.go:247] Exporting kubecfg for cluster kOps has set your kubectl context to privatekopscluster.k8s.local Cluster changes have been applied to the cloud. Changes may require instances to restart: kops rolling-update cluster ``` This is "kOps" creating the instance group with your bastion instance. Let's validate our cluster: ```bash kops validate cluster Using cluster from kubectl context: privatekopscluster.k8s.local Validating cluster privatekopscluster.k8s.local INSTANCE GROUPS NAME ROLE MACHINETYPE MIN MAX SUBNETS bastions Bastion t3.micro 1 1 utility-us-east-1a master-us-east-1a Master t3.micro 1 1 us-east-1a master-us-east-1b Master t3.micro 1 1 us-east-1b master-us-east-1c Master t3.micro 1 1 us-east-1c nodes Node t3.micro 2 2 us-east-1a,us-east-1b,us-east-1c NODE STATUS NAME ROLE READY ip-172-20-111-44.ec2.internal master True ip-172-20-44-102.ec2.internal node True ip-172-20-53-10.ec2.internal master True ip-172-20-64-151.ec2.internal node True ip-172-20-74-55.ec2.internal master True Your cluster privatekopscluster.k8s.local is ready ``` Our bastion instance group is there. Also, kops created an ELB for our "bastions" instance group that we can check with the following command: ```bash aws elb --output=table describe-load-balancers|grep DNSName.\*bastion|awk '{print $4}' bastion-privatekOpscluste-bgl0hp-1327959377.us-east-1.elb.amazonaws.com ``` For this LAB, the "ELB" FQDN is "bastion-privatekOpscluste-bgl0hp-1327959377.us-east-1.elb.amazonaws.com" We can "ssh" to it: ```bash ssh -i ~/.ssh/id_rsa ubuntu@bastion-privatekOpscluste-bgl0hp-1327959377.us-east-1.elb.amazonaws.com The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Last login: Mon Aug 28 18:07:16 2017 from 172.20.0.238 ``` Because we really want to use a ssh-agent, start it first (this will : ```bash eval `ssh-agent -s` ``` And add your key to the agent with "ssh-add": ```bash ssh-add ~/.ssh/id_rsa Identity added: /home/kops/.ssh/id_rsa (/home/kops/.ssh/id_rsa) ``` Then, ssh to your bastion ELB FQDN ```bash ssh -A ubuntu@bastion-privatekOpscluste-bgl0hp-1327959377.us-east-1.elb.amazonaws.com ``` Or if you want to automate it: ```bash ssh -A ubuntu@`aws elb --output=table describe-load-balancers|grep DNSName.\*bastion|awk '{print $4}'` ``` And from the bastion, you can ssh to your masters or workers: ```bash ubuntu@ip-172-20-2-64:~$ ssh ubuntu@ip-172-20-53-10.ec2.internal The authenticity of host 'ip-172-20-53-10.ec2.internal (172.20.53.10)' can't be established. ECDSA key fingerprint is d1:30:c6:5e:77:ff:cd:d2:7d:1f:f9:12:e3:b0:28:e4. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'ip-172-20-53-10.ec2.internal,172.20.53.10' (ECDSA) to the list of known hosts. The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. ubuntu@ip-172-20-53-10:~$ ``` **NOTE:** Remember that you can obtain the local DNS names from your "kops validate cluster" command, or, with the "kubectl get nodes" command. We recommend the first (kops validate cluster) because it will tell you who are the masters and who the worker nodes: ```bash kops validate cluster Using cluster from kubectl context: privatekopscluster.k8s.local Validating cluster privatekopscluster.k8s.local INSTANCE GROUPS NAME ROLE MACHINETYPE MIN MAX SUBNETS bastions Bastion t3.micro 1 1 utility-us-east-1a master-us-east-1a Master t3.micro 1 1 us-east-1a master-us-east-1b Master t3.micro 1 1 us-east-1b master-us-east-1c Master t3.micro 1 1 us-east-1c nodes Node t3.micro 2 2 us-east-1a,us-east-1b,us-east-1c NODE STATUS NAME ROLE READY ip-172-20-111-44.ec2.internal master True ip-172-20-44-102.ec2.internal node True ip-172-20-53-10.ec2.internal master True ip-172-20-64-151.ec2.internal node True ip-172-20-74-55.ec2.internal master True Your cluster privatekopscluster.k8s.local is ready ``` ## MAKING THE BASTION LAYER "HIGHLY AVAILABLE" If for any reason any "legendary monster from the comics" decides to destroy the amazon AZ that contains our bastion, we'll basically be unable to enter to our instances. Let's add some H.A. to our bastion layer and force amazon to deploy additional bastion instances on other availability zones. First, let's edit our "bastions" instance group: ```bash kops edit ig bastions --name ${NAME} ``` And change minSize/maxSize to 3 (3 instances) and add more subnets: ```yaml apiVersion: kops.k8s.io/v1alpha2 kind: InstanceGroup metadata: labels: kops.k8s.io/cluster: privatekopscluster.k8s.local name: bastions spec: image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20200907 machineType: t3.micro maxSize: 3 minSize: 3 role: Bastion subnets: - utility-us-east-1a - utility-us-east-1b - utility-us-east-1c ``` Save the changes, and update your cluster: ```bash kops update cluster ${NAME} --yes ``` **NOTE:** After the update command, you'll see the following recurring error: ```bash W0828 15:22:46.461033 5852 executor.go:109] error running task "LoadBalancer/bastion.privatekopscluster.k8s.local" (1m5s remaining to succeed): subnet changes on LoadBalancer not yet implemented: actual=[subnet-c029639a] -> expected=[subnet-23f8a90f subnet-4a24ef2e subnet-c029639a] ``` This happens because the original ELB created by "kOps" only contained the subnet "utility-us-east-1a" and it can't add the additional subnets. In order to fix this, go to your AWS console and add the remaining subnets in your ELB. Then the recurring error will disappear and your bastion layer will be fully redundant. **NOTE:** Always think ahead: If you are creating a fully redundant cluster (with fully redundant bastions), always configure the redundancy from the beginning. When you are finished playing with kOps, then destroy/delete your cluster: Finally, let's destroy our cluster: ```bash kops delete cluster ${NAME} --yes ```