diff --git a/docs/20221003 - 300 pods per node.md b/docs/20221003 - 300 pods per node.md new file mode 100644 index 0000000..55d9fa1 --- /dev/null +++ b/docs/20221003 - 300 pods per node.md @@ -0,0 +1,163 @@ +# 2022-10-03 - 300 pods per node test + +## Results + +- a 6-node RKE cluster is set up with a limit of 300 pods per node (1800 pods maximum cluster capacity) +- an `ngnix` workload at 90% pod capacity (270 pods per node / 1620 pods total) is scheduled and health checks stay green +- some day-2 operations are performed on the cluster, with the workload staying healthy (after a transitory period) at every step: + - addition (uncordoning) of one node + - full workload redeployment + - removal (draining) of one node + - upgrade of RKE2 on all cluster nodes +- CPU overhead at rest (kubelet, container runtime, etc.): ~15% (of a 4-vCPU 3.1 GHz Intel Xeon) + +The test is completely automated, takes ~1 hour, and costs ~2 USD in AWS resources. + +[Screen captures and logs](https://mysuse-my.sharepoint.com/:f:/g/personal/moio_suse_com/Esqw92qeMS5AkAwn6PaV-O0B8tADbJ9rK9zRjqL-yKIGMQ?e=OUbMaz) are available to SUSE employees. + +## Methodology notes + +- 300 pods per node was chosen to be higher than 254 pods per node, which is the maximum possible with the default /24 CIDR range +- a /22 CIDR range was assigned, as [Google recommends](https://cloud.google.com/kubernetes-engine/docs/how-to/flexible-pod-cidr#cidr_ranges_for_clusters) having at least twice the number of available IP addresses than pods +- workload pods were scheduled up to 90% of the maximum number (270 pods per node). That still leaves 30 pods for system components and is still higher than 254 +- the downstream cluster is imported because AWS provisioning [is not supported with temporary session tokens at the time of writing](https://github.com/rancher/rancher/issues/15962) +- the downstream cluster RKE2 upgrade is performed via the [installation script method](https://github.com/rancher/rke2/blob/v1.25.2%2Brke2r1/docs/upgrade/basic_upgrade.md#upgrade-rke2-using-the-installation-script) because it's the only option with imported clusters at the time of writing +- upgrades via installation scripts imply draining nodes that are being upgraded. One extra node is added for that purpose +- bugs not related to scalability have been worked around + +## Test outline + - infrastructure is set up: + - AWS hardware (VMs, network devices...) are deployed + - RKE2 is installed on upstream cluster nodes, Rancher is installed on top of it + - RKE3 is installed on downstream clusters + - test is conducted: + - initial admin user is set up ([detail](../cypress/cypress/e2e/users.cy.js)) + - downstream cluster is imported into Rancher ([detail](../cypress/cypress/e2e/imported-clusters.cy.js)) + - an ngnix deployment ([detail](../cypress/cypress/e2e/deployment.yaml)) is applied ([detail](../cypress/cypress/e2e/workloads.cy.js)) + - deployment replicas are scaled up to 1620 + - one worker node is drained + - deployment is recycled + - worker node is uncordoned + - deployment is recycled again + - RKE2 is updated, node per node ([detail](../cypress/cypress/e2e/rke2-update.cy.js)) + - logs are collected + +## AWS Hardware configuration outline + +- bastion host (for SSH tunnelling only): `t4g.small`, 50 GiB EBS `gp3` root volume +- Rancher cluster: 3-node `t3.large`, 50 GiB EBS `gp3` root volume +- downstream cluster: 6 to 7 nodes, `t3.xlarge`, 50 GiB EBS gp3 root volume +- networking: one /16 AWS VPC with two /24 subnets + - public subnet: contains the one bastion host which exposes port 22 to the Internet via security groups + - private subnet: contains all other nodes. Traffic allowed only internally and to/from the bastion via SSH + +References: + - [instance types](https://aws.amazon.com/ec2/instance-types/) + - [EBS](https://aws.amazon.com/ebs/) + - [VPC](https://aws.amazon.com/vpc/) + +## Software configuration outline + +- bastion host: SLES 15 SP4 +- Rancher cluster: Rancher 2.6.5 on a 3-node RKE2 v1.23.10+rke2r1 cluster + - all nodes based on Rocky Linux 8.6 +- downstream cluster: RKE2 v1.22.13+rke2r1, 3 server nodes and 4 agent nodes + - all nodes based on Rocky Linux 8.6 + +The number of 300 pods per node is set via: +- adding a `kube-controller-manager-arg: "node-cidr-mask-size=22"` line to `/etc/rancher/rke2/config.yaml` +- adding a `kubelet-arg: "config=/etc/rancher/rke2/kubelet-custom.config"` line to `/etc/rancher/rke2/config.yaml` +- creating a `/etc/rancher/rke2/kubelet-custom.config` file with the following content: +```yaml +apiVersion: kubelet.config.k8s.io/v1beta1 +kind: KubeletConfiguration +maxPods: 250 +``` +- restarting the `rke2-server` service + +See [the rke2 installation script in this repo](../rke2/install_rke2.sh) for details. + +## Full configuration details + +All infrastructure is defined via [Terraform](https://www.terraform.io/) files in the [20221003_300_pods_per_node](https://github.com/moio/scalability-tests/tree/20221003_300_pods_per_node) branch. Note in particular [inputs.tf](../inputs.tf) for the main parameters. +All tests are driven by [Cypress](https://www.cypress.io/) files in the [cypress](https://github.com/moio/scalability-tests/tree/20221003_300_pods_per_node/cypress) directory. + +## Reproduction Instructions + +### Requirements + +- API access to EC2 configured for your terminal + - for SUSE Engineering: + - [have "AWS Landing Zone" added to your Okta account](https://confluence.suse.com/display/CCOE/Requesting+AWS+Access) + - open [Okta](https://suse.okta.com/) -> "AWS Landing Zone" + - Click on "AWS Account" -> your account -> "Command line or programmatic access" -> click to copy commands under "Option 1: Set AWS environment variables" + - paste contents in terminal +- [Terraform](https://www.terraform.io/downloads) +- `git` +- `node` +- `kubectl` +- `nc` (netcat) + +### Setup + +- clone this project: +```shell +git clone https://github.com/moio/scalability-tests.git +cd scalability-tests +git checkout 20221003_300_pods_per_node +``` +- initialize Terraform and Cypress: +```shell +terraform init +pushd cypress +npm install cypress --save-dev +``` + +### Run + +```shell +# deploys all infrastructure +terraform apply -auto-approve + +# runs tests +cd cypress; ./node_modules/cypress/bin/cypress run +``` + +### Inspect results + +- `cypress/cypress/logs` contains log tarballs collected from all downstream nodes +- `cypress/cypress/videos` contains screen captures of test runs +- Terraform output will contain instructions to access the newly created clusters, eg. +``` +UPSTREAM CLUSTER ACCESS: + export KUBECONFIG=./config/upstream.yaml + +RANCHER UI: + https://upstream.local.gd:3000 + +DOWNSTREAM CLUSTER ACCESS: + export KUBECONFIG=./config/downstream.yaml +``` +- `./config` contains scripts to SSH into any VM and reopen SSH tunnels to ports 3000 (Web UI), 6443 (upstream cluster API), 6444 (downstream cluster API) +- Cypress tests can be run interactively via: + +```shell +cd cypress +npx cypress open +``` + +### Cleanup + +All created infrastructure can be destroyed via: +```shell +terraform destroy -auto-approve +``` + +## Troubleshooting + +- if the error below is produced: +``` +Error: creating EC2 Instance: VcpuLimitExceeded: You have requested more vCPU capacity than your current vCPU limit of 32 allows for the instance bucket that the specified instance type belongs to. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit. +``` + +Then you need to request higher limits for your account to AWS. This can be done by visting [the Service Quotas](https://console.aws.amazon.com/servicequotas/home) page and filling the details to request an increase of the "Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances" limit.