community/contributors/devel/on-call-federation-build-co...

8.0 KiB

Federation Buildcop Guide and Playbook

Federation runs two classes of tests: CI and Presubmits.

CI

  • These tests run on the HEADs of master and release branches (starting from Kubernetes v1.6).
  • As a result, they run on code that's already merged.
  • As the name suggests, they run continuously. Currently, they are configured to run at least once every 30 minutes.
  • Federation CI tests run as periodic jobs on prow.
  • CI jobs always run sequentially. In other words, no single CI job can have two instances of the job running at the same time.

Configuration

Configuration steps are described in 0c56d2c9d3/jenkins/README.md (how-to-work-with-jenkins-jobs)

The configuration of CI tests are stored in:

Results

Results of all the federation CI tests, including the soak tests, are listed in the corresponding tabs on the Cluster Federation page in the testgrid. https://k8s-testgrid.appspot.com/sig-federation

Playbook

Triggering a new run

Please ping someone who has access to the Jenkins UI/dashboard and ask them to login and click the "Build Now" link on the Jenkins page corresponding to the CI job you want to manually start.

Quota cleanup

Please ping someone who has access to the GCP project. Ask them to look at the quotas and delete the leaked resources by clicking the delete button corresponding to those leaked resources on Google Cloud Console.

Presubmit

  • We only have one presubmit test, but it is configured very differently than the CI tests.
  • The presubmit test is currently configured to run on the master branch and any release branch that's 1.7 or newer.
  • Federation presubmit infrastructure is composed of two separate test jobs:
    • Deploy job: This job runs in the background and recycles federated clusters every time it runs. Although this job supports federation presubmit tests, it is configured as a CI/Soak job. More on configuration later. Since recycling federated clusters is an expensive operation, we do not want to run this often. Hence, this job is configured to run once every 24 hours, around midnight Pacific time.
    • Test job: This is the job that runs federation presubmit tests on every PR in the core repository, i.e. kubernetes/kubernetes. These jobs can run in parallel on the PRs in the repository.

Two-jobs setup

The deploy job runs once every 24 hours at around midnight Pacific time. It is configured to turn up and tear down 3 federated clusters. It starts out by downloading the latest Kubernetes release built from kubernetes/kubernetes master. It then tears down the existing federated clusters and turns up new ones. As the clusters are created, their kubeconfigs are written to a local kubeconfig file where the job runs. Once all the clusters are successfully turned up, the local kubeconfig is then copied to a pre-configured GCS bucket. Any existing kubeconfig in the bucket will be overwritten.

The test job on the other hand starts by copying the latest kubeconfig from the pre-configured GCS bucket. It uses this kubeconfig to deploy a new federation control plane on one of the clusters in the kubeconfig. It then joins all the clusters in the kubeconfig, including the host cluster where federation control plane is deployed, as members to the newly created federation control plane. The test job then runs the federation presubmit tests on this control plane and tears down the control plane in the end.

Since federated clusters are recycled only once every 24 hours, all presubmit runs in that period share the federated clusters. And since there could be multiple presubmit tests running in parallel, each instance of the test gets its own namespace where it deploys the federation control plane. These federation control planes deployed in separate namespaces are independent of each other and do not interfere with other federation control planes in any way.

Configuration

The two jobs are configured differently.

Deploy job

The deploy job is configured as a CI/Soak job in Jenkins. Configuration steps are described in 0c56d2c9d3/jenkins/README.md (how-to-work-with-jenkins-jobs)

The configuration of the deploy job is stored in:

Test job

The test job is configured in prow, but it runs in Jenkins mode. The configuration steps are described in 0c56d2c9d3/README.md (create-a-new-job)

The configuration of the test job is stored in:

Results

Aggregated results are available on the Gubernator dashboard page for the federation presubmit tests.

https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-federation-e2e-gce

Metrics

We track the flakiness metrics of all the presubmit jobs and individual tests that run against PRs in kubernetes/kubernetes.

Playbook

Triggering a new deploy job run

Please ping someone who has access to the Jenkins UI/dashboard and ask them to login and click the "Build Now" link on the Jenkins page corresponding to the CI job you want to manually start.

Triggering a new test run

Use the /test command on the PR to retrigger the test. The exact incantation is: /test pull-kubernetes-federation-e2e-gce

Quota cleanup

Please ping someone who has access to k8s-jkns-pr-bldr-e2e-gce-fdrtn GCP project. Ask them to look at the quotas and delete the leaked resources by clicking the delete button corresponding to those leaked resources on Google Cloud Console.