Merge pull request #1427 from shashidharatd/fed-on-call

Automatic merge from submit-queue.

Update on-call-federation-build-cop.md with current state

/assign @madhusudancs @krzyzacy 
/cc @kubernetes/sig-multicluster-pr-reviews
This commit is contained in:
Kubernetes Submit Queue 2017-11-22 19:09:26 -08:00 committed by GitHub
commit 33e0e9a29b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 47 additions and 106 deletions

View File

@ -1,37 +1,39 @@
# Federation Buildcop Guide and Playbook # Federation Buildcop Guide and Playbook
Federation runs two classes of tests: CI and Presubmits. Federation runs two classes of tests: CI and Pre-submits.
## CI ## CI
* These tests run on the HEADs of master and release branches (starting * These tests run on the HEADs of master and release branches (starting
from Kubernetes v1.6). from Kubernetes v1.7).
* As a result, they run on code that's already merged. * As a result, they run on code that's already merged.
* As the name suggests, they run continuously. Currently, they are * As the name suggests, they run continuously. Currently, they are
configured to run configured to run at least once every 30 minutes.
[at least once every 30 minutes](https://github.com/kubernetes/test-infra/blob/22c38cfb64137086373e1b89d5e7d98766560747/prow/config.yaml#L3686). * Federation CI tests run as periodic jobs on prow.
* Federation CI tests run as
[periodic jobs on prow](https://github.com/kubernetes/test-infra/blob/22c38cfb64137086373e1b89d5e7d98766560747/prow/config.yaml#L3686).
* CI jobs always run sequentially. In other words, no single CI job * CI jobs always run sequentially. In other words, no single CI job
can have two instances of the job running at the same time. can have two instances of the job running at the same time.
* Latest build results can be viewed in [testgrid](https://k8s-testgrid.appspot.com/sig-multicluster)
### Configuration ### Configuration
Configuration steps are described in https://github.com/kubernetes/test-infra#create-a-new-job Configuration steps are described in https://github.com/kubernetes/test-infra#create-a-new-job.
Federation CI e2e job names are as below:
* master branch - `ci-federation-e2e-gce` and `ci-federation-e2e-gce-serial`
* 1.8 release branch - `ci-kubernetes-e2e-gce-federation-release-1-8`
* 1.7 release branch - `ci-kubernetes-e2e-gce-federation-release-1-7`
The configuration of CI tests are stored in: Search for the above job names in various configuration files as below:
* Prow config: https://github.com/kubernetes/test-infra/blob/499410989420f724b833b07f797fde38fff58910/prow/config.yaml * Prow config: https://github.com/kubernetes/test-infra/blob/master/prow/config.yaml
* Test job/bootstrap config: https://github.com/kubernetes/test-infra/blob/499410989420f724b833b07f797fde38fff58910/jobs/config.json * Test job/bootstrap config: https://github.com/kubernetes/test-infra/blob/master/jobs/config.json
* Test grid config: https://github.com/kubernetes/test-infra/blob/499410989420f724b833b07f797fde38fff58910/testgrid/config/config.yaml * Test grid config: https://github.com/kubernetes/test-infra/blob/master/testgrid/config/config.yaml
* Job specific config: https://github.com/kubernetes/test-infra/tree/499410989420f724b833b07f797fde38fff58910/jobs/env * Job specific config: https://github.com/kubernetes/test-infra/tree/master/jobs/env
### Results ### Results
Results of all the federation CI tests, including the soak tests, are Results of all the federation CI tests are listed in the corresponding
listed in the corresponding tabs on the Cluster Federation page in the tabs on the Cluster Federation page in the testgrid.
testgrid. https://k8s-testgrid.appspot.com/sig-multicluster
https://k8s-testgrid.appspot.com/sig-federation
### Playbook ### Playbook
@ -39,7 +41,7 @@ https://k8s-testgrid.appspot.com/sig-federation
Please ping someone who has access to the prow project and ask Please ping someone who has access to the prow project and ask
them to click the `rerun` button from, for example them to click the `rerun` button from, for example
http://prow.k8s.io/?type=periodic&job=ci-kubernetes-e2e-gce-federation, http://prow.k8s.io/?type=periodic&job=ci-federation-e2e-gce,
and execute the kubectl command. and execute the kubectl command.
#### Quota cleanup #### Quota cleanup
@ -50,115 +52,54 @@ delete button corresponding to those leaked resources on Google Cloud
Console. Console.
## Presubmit ## Pre-submit
* We only have one presubmit test, but it is configured very * The pre-submit test is currently configured to run on the master
differently than the CI tests. branch and any release branch that's 1.9 or newer.
* The presubmit test is currently configured to run on the master * Multiple pre-submit jobs could be running in parallel(one per pr).
branch and any release branch that's 1.7 or newer. * Latest build results can be viewed in [testgrid](https://k8s-testgrid.appspot.com/presubmits-federation)
* Federation presubmit infrastructure is composed of two separate test * We have following pre-submit jobs in federation
jobs: * bazel-test - Runs all the bazel test targets in federation.
* Deploy job: This job runs in the background and recycles federated * e2e-gce - Runs federation e2e tests on gce.
clusters every time it runs. Although this job supports federation * verify - Runs federation unit, integration tests and few verify scripts.
presubmit tests, it is configured as a CI/Soak job. More on
configuration later. Since recycling federated clusters is an
expensive operation, we do not want to run this often. Hence, this
job is configured to run once every 24 hours, around midnight
Pacific time.
* Test job: This is the job that runs federation presubmit tests on
every PR in the core repository, i.e.
[kubernetes/kubernetes](https://github.com/kubernetes/kubernetes).
These jobs can run in parallel on the PRs in the repository.
### Two-jobs setup
The deploy job runs once every 24 hours at around midnight Pacific
time. It is configured to turn up and tear down 3 federated clusters.
It starts out by downloading the latest Kubernetes release built from
[kubernetes/kubernetes](https://github.com/kubernetes/kubernetes)
master. It then tears down the existing federated clusters and turns
up new ones. As the clusters are created, their kubeconfigs are
written to a local kubeconfig file where the job runs. Once all the
clusters are successfully turned up, the local kubeconfig is then
copied to a pre-configured GCS bucket. Any existing kubeconfig in the
bucket will be overwritten.
The test job on the other hand starts by copying the latest kubeconfig
from the pre-configured GCS bucket. It uses this kubeconfig to deploy
a new federation control plane on one of the clusters in the
kubeconfig. It then joins all the clusters in the kubeconfig, including
the host cluster where federation control plane is deployed, as members
to the newly created federation control plane. The test job then runs
the federation presubmit tests on this control plane and tears down the
control plane in the end.
Since federated clusters are recycled only once every 24 hours, all
presubmit runs in that period share the federated clusters. And since
there could be multiple presubmit tests running in parallel, each
instance of the test gets its own namespace where it deploys the
federation control plane. These federation control planes deployed in
separate namespaces are independent of each other and do not interfere
with other federation control planes in any way.
### Configuration ### Configuration
The two jobs are configured differently. Configuration steps are described in https://github.com/kubernetes/test-infra#create-a-new-job.
Federation pre-submit jobs have following names.
* bazel-test - `pull-federation-bazel-test`
* verify - `pull-federation-verify`
* e2e-gce - `pull-federation-e2e-gce`
#### Deploy job Search for the above job names in various configuration files as below:
The deploy job is configured as a CI/Soak job in Jenkins. * Prow config: https://github.com/kubernetes/test-infra/blob/master/prow/config.yaml
Configuration steps are described in https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jenkins/README.md#how-to-work-with-jenkins-jobs * Test job/bootstrap config: https://github.com/kubernetes/test-infra/blob/master/jobs/config.json
* Test grid config: https://github.com/kubernetes/test-infra/blob/master/testgrid/config/config.yaml
The configuration of the deploy job is stored in: * Job specific config: https://github.com/kubernetes/test-infra/tree/master/jobs/env
* Jenkins config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jenkins/job-configs/kubernetes-jenkins/bootstrap-ci-soak.yaml#L76
* Test job/bootstrap config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jobs/config.json#L3996
* Test grid config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/testgrid/config/config.yaml#L152
* Job specific config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jobs/ci-kubernetes-pull-gce-federation-deploy.env
#### Test job
The test job is
[configured in prow](https://github.com/kubernetes/test-infra/blob/35ceb37e999bb0589218708262634951b79dfe05/prow/config.yaml#L236),
but it runs in Jenkins mode. The configuration steps are described in
https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/README.md#create-a-new-job
The configuration of the test job is stored in:
* Prow config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/prow/config.yaml#L244
* Test job/bootstrap config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jobs/config.json#L4691
* Job specific config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jobs/pull-kubernetes-federation-e2e-gce.env
### Results ### Results
Aggregated results are available on the Gubernator dashboard page for Aggregated results are available on the Gubernator dashboard page for
the federation presubmit tests. the federation pre-submit tests.
https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-federation-e2e-gce https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/pr-logs/directory/pull-federation-e2e-gce
### Metrics ### Metrics
We track the flakiness metrics of all the presubmit jobs and We track the flakiness metrics of all the pre-submit jobs and
individual tests that run against PRs in individual tests that run against PRs in
[kubernetes/kubernetes](https://github.com/kubernetes/kubernetes). [kubernetes/federation](https://github.com/kubernetes/federation).
* The metrics that we track are documented in https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/metrics/README.md#metrics. * The metrics that we track are documented in https://github.com/kubernetes/test-infra/blob/master/metrics/README.md#metrics.
* Job-level metrics are available in - [http://storage.googleapis.com/k8s-metrics/job-flakes-latest.json](). * Job-level metrics are available in http://storage.googleapis.com/k8s-metrics/job-flakes-latest.json.
* As of this writing, federation presubmits have a [success rate of
93.4%](http://storage.googleapis.com/k8s-metrics/job-flakes-latest.json).
### Playbook ### Playbook
#### Triggering a new deploy job run #### Triggering a new run
Please ping someone who has access to the Jenkins UI/dashboard and ask Use the `/test` command on the PR to re-trigger the test. The exact
them to login and click the "Build Now" link on the Jenkins page incantation is: `/test pull-federation-e2e-gce`
corresponding to the CI job you want to manually start.
#### Triggering a new test run
Use the `/test` command on the PR to retrigger the test. The exact
incantation is: `/test pull-kubernetes-federation-e2e-gce`
#### Quota cleanup #### Quota cleanup