Merge pull request #1427 from shashidharatd/fed-on-call

Automatic merge from submit-queue.

Update on-call-federation-build-cop.md with current state

/assign @madhusudancs @krzyzacy 
/cc @kubernetes/sig-multicluster-pr-reviews
This commit is contained in:
Kubernetes Submit Queue 2017-11-22 19:09:26 -08:00 committed by GitHub
commit 33e0e9a29b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 47 additions and 106 deletions

View File

@ -1,37 +1,39 @@
# Federation Buildcop Guide and Playbook
Federation runs two classes of tests: CI and Presubmits.
Federation runs two classes of tests: CI and Pre-submits.
## CI
* These tests run on the HEADs of master and release branches (starting
from Kubernetes v1.6).
from Kubernetes v1.7).
* As a result, they run on code that's already merged.
* As the name suggests, they run continuously. Currently, they are
configured to run
[at least once every 30 minutes](https://github.com/kubernetes/test-infra/blob/22c38cfb64137086373e1b89d5e7d98766560747/prow/config.yaml#L3686).
* Federation CI tests run as
[periodic jobs on prow](https://github.com/kubernetes/test-infra/blob/22c38cfb64137086373e1b89d5e7d98766560747/prow/config.yaml#L3686).
configured to run at least once every 30 minutes.
* Federation CI tests run as periodic jobs on prow.
* CI jobs always run sequentially. In other words, no single CI job
can have two instances of the job running at the same time.
* Latest build results can be viewed in [testgrid](https://k8s-testgrid.appspot.com/sig-multicluster)
### Configuration
Configuration steps are described in https://github.com/kubernetes/test-infra#create-a-new-job
Configuration steps are described in https://github.com/kubernetes/test-infra#create-a-new-job.
Federation CI e2e job names are as below:
* master branch - `ci-federation-e2e-gce` and `ci-federation-e2e-gce-serial`
* 1.8 release branch - `ci-kubernetes-e2e-gce-federation-release-1-8`
* 1.7 release branch - `ci-kubernetes-e2e-gce-federation-release-1-7`
The configuration of CI tests are stored in:
Search for the above job names in various configuration files as below:
* Prow config: https://github.com/kubernetes/test-infra/blob/499410989420f724b833b07f797fde38fff58910/prow/config.yaml
* Test job/bootstrap config: https://github.com/kubernetes/test-infra/blob/499410989420f724b833b07f797fde38fff58910/jobs/config.json
* Test grid config: https://github.com/kubernetes/test-infra/blob/499410989420f724b833b07f797fde38fff58910/testgrid/config/config.yaml
* Job specific config: https://github.com/kubernetes/test-infra/tree/499410989420f724b833b07f797fde38fff58910/jobs/env
* Prow config: https://github.com/kubernetes/test-infra/blob/master/prow/config.yaml
* Test job/bootstrap config: https://github.com/kubernetes/test-infra/blob/master/jobs/config.json
* Test grid config: https://github.com/kubernetes/test-infra/blob/master/testgrid/config/config.yaml
* Job specific config: https://github.com/kubernetes/test-infra/tree/master/jobs/env
### Results
Results of all the federation CI tests, including the soak tests, are
listed in the corresponding tabs on the Cluster Federation page in the
testgrid.
https://k8s-testgrid.appspot.com/sig-federation
Results of all the federation CI tests are listed in the corresponding
tabs on the Cluster Federation page in the testgrid.
https://k8s-testgrid.appspot.com/sig-multicluster
### Playbook
@ -39,7 +41,7 @@ https://k8s-testgrid.appspot.com/sig-federation
Please ping someone who has access to the prow project and ask
them to click the `rerun` button from, for example
http://prow.k8s.io/?type=periodic&job=ci-kubernetes-e2e-gce-federation,
http://prow.k8s.io/?type=periodic&job=ci-federation-e2e-gce,
and execute the kubectl command.
#### Quota cleanup
@ -50,115 +52,54 @@ delete button corresponding to those leaked resources on Google Cloud
Console.
## Presubmit
## Pre-submit
* We only have one presubmit test, but it is configured very
differently than the CI tests.
* The presubmit test is currently configured to run on the master
branch and any release branch that's 1.7 or newer.
* Federation presubmit infrastructure is composed of two separate test
jobs:
* Deploy job: This job runs in the background and recycles federated
clusters every time it runs. Although this job supports federation
presubmit tests, it is configured as a CI/Soak job. More on
configuration later. Since recycling federated clusters is an
expensive operation, we do not want to run this often. Hence, this
job is configured to run once every 24 hours, around midnight
Pacific time.
* Test job: This is the job that runs federation presubmit tests on
every PR in the core repository, i.e.
[kubernetes/kubernetes](https://github.com/kubernetes/kubernetes).
These jobs can run in parallel on the PRs in the repository.
### Two-jobs setup
The deploy job runs once every 24 hours at around midnight Pacific
time. It is configured to turn up and tear down 3 federated clusters.
It starts out by downloading the latest Kubernetes release built from
[kubernetes/kubernetes](https://github.com/kubernetes/kubernetes)
master. It then tears down the existing federated clusters and turns
up new ones. As the clusters are created, their kubeconfigs are
written to a local kubeconfig file where the job runs. Once all the
clusters are successfully turned up, the local kubeconfig is then
copied to a pre-configured GCS bucket. Any existing kubeconfig in the
bucket will be overwritten.
The test job on the other hand starts by copying the latest kubeconfig
from the pre-configured GCS bucket. It uses this kubeconfig to deploy
a new federation control plane on one of the clusters in the
kubeconfig. It then joins all the clusters in the kubeconfig, including
the host cluster where federation control plane is deployed, as members
to the newly created federation control plane. The test job then runs
the federation presubmit tests on this control plane and tears down the
control plane in the end.
Since federated clusters are recycled only once every 24 hours, all
presubmit runs in that period share the federated clusters. And since
there could be multiple presubmit tests running in parallel, each
instance of the test gets its own namespace where it deploys the
federation control plane. These federation control planes deployed in
separate namespaces are independent of each other and do not interfere
with other federation control planes in any way.
* The pre-submit test is currently configured to run on the master
branch and any release branch that's 1.9 or newer.
* Multiple pre-submit jobs could be running in parallel(one per pr).
* Latest build results can be viewed in [testgrid](https://k8s-testgrid.appspot.com/presubmits-federation)
* We have following pre-submit jobs in federation
* bazel-test - Runs all the bazel test targets in federation.
* e2e-gce - Runs federation e2e tests on gce.
* verify - Runs federation unit, integration tests and few verify scripts.
### Configuration
The two jobs are configured differently.
Configuration steps are described in https://github.com/kubernetes/test-infra#create-a-new-job.
Federation pre-submit jobs have following names.
* bazel-test - `pull-federation-bazel-test`
* verify - `pull-federation-verify`
* e2e-gce - `pull-federation-e2e-gce`
#### Deploy job
Search for the above job names in various configuration files as below:
The deploy job is configured as a CI/Soak job in Jenkins.
Configuration steps are described in https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jenkins/README.md#how-to-work-with-jenkins-jobs
The configuration of the deploy job is stored in:
* Jenkins config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jenkins/job-configs/kubernetes-jenkins/bootstrap-ci-soak.yaml#L76
* Test job/bootstrap config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jobs/config.json#L3996
* Test grid config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/testgrid/config/config.yaml#L152
* Job specific config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jobs/ci-kubernetes-pull-gce-federation-deploy.env
#### Test job
The test job is
[configured in prow](https://github.com/kubernetes/test-infra/blob/35ceb37e999bb0589218708262634951b79dfe05/prow/config.yaml#L236),
but it runs in Jenkins mode. The configuration steps are described in
https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/README.md#create-a-new-job
The configuration of the test job is stored in:
* Prow config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/prow/config.yaml#L244
* Test job/bootstrap config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jobs/config.json#L4691
* Job specific config: https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/jobs/pull-kubernetes-federation-e2e-gce.env
* Prow config: https://github.com/kubernetes/test-infra/blob/master/prow/config.yaml
* Test job/bootstrap config: https://github.com/kubernetes/test-infra/blob/master/jobs/config.json
* Test grid config: https://github.com/kubernetes/test-infra/blob/master/testgrid/config/config.yaml
* Job specific config: https://github.com/kubernetes/test-infra/tree/master/jobs/env
### Results
Aggregated results are available on the Gubernator dashboard page for
the federation presubmit tests.
the federation pre-submit tests.
https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-federation-e2e-gce
https://k8s-gubernator.appspot.com/builds/kubernetes-jenkins/pr-logs/directory/pull-federation-e2e-gce
### Metrics
We track the flakiness metrics of all the presubmit jobs and
We track the flakiness metrics of all the pre-submit jobs and
individual tests that run against PRs in
[kubernetes/kubernetes](https://github.com/kubernetes/kubernetes).
[kubernetes/federation](https://github.com/kubernetes/federation).
* The metrics that we track are documented in https://github.com/kubernetes/test-infra/blob/0c56d2c9d32307c0a0f8fece85ef6919389e77fd/metrics/README.md#metrics.
* Job-level metrics are available in - [http://storage.googleapis.com/k8s-metrics/job-flakes-latest.json]().
* As of this writing, federation presubmits have a [success rate of
93.4%](http://storage.googleapis.com/k8s-metrics/job-flakes-latest.json).
* The metrics that we track are documented in https://github.com/kubernetes/test-infra/blob/master/metrics/README.md#metrics.
* Job-level metrics are available in http://storage.googleapis.com/k8s-metrics/job-flakes-latest.json.
### Playbook
#### Triggering a new deploy job run
#### Triggering a new run
Please ping someone who has access to the Jenkins UI/dashboard and ask
them to login and click the "Build Now" link on the Jenkins page
corresponding to the CI job you want to manually start.
#### Triggering a new test run
Use the `/test` command on the PR to retrigger the test. The exact
incantation is: `/test pull-kubernetes-federation-e2e-gce`
Use the `/test` command on the PR to re-trigger the test. The exact
incantation is: `/test pull-federation-e2e-gce`
#### Quota cleanup