[flake finders] fix episode 000 readme formatting

Fixes code block formatting in guide for flake finders episode 000.

Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
This commit is contained in:
hasheddan 2021-02-28 15:16:42 -06:00
parent d3ae637cf3
commit 1016d17ce3
No known key found for this signature in database
GPG Key ID: BD68BC686A14C271
1 changed files with 37 additions and 36 deletions

View File

@ -28,73 +28,74 @@ collaborate across teams to resolve test maintenance issues.
[build-master-canary](https://testgrid.k8s.io/sig-release-master-informing#build-master-canary)
### Breaking PRs
- [Use buildx in favor of `FROM --platform` syntax
](https://github.com/kubernetes/kubernetes/pull/98529)
- [Use buildx in favor of `FROM --platform`
syntax](https://github.com/kubernetes/kubernetes/pull/98529)
- [Switch to `docker buildx` for conformance
image](https://github.com/kubernetes/kubernetes/pull/98569)
## Investigation
1. Desire to move from Google-owned infrastructure to Kubernetes community
infrastructure. Thus the introduction of a **canary** build job to test pushing
building and pushing artifacts with new infrastructure.
infrastructure. Thus the introduction of a **canary** build job to test
pushing building and pushing artifacts with new infrastructure.
1. Desire to move off of `bootstrap.py` job (currently being used for canary
job) to `krel` tooling.
job) to `krel` tooling.
1. Separate job existed (`ci-kubernetes-build-no-bootstrap`) that was doing the
same thing as the canary job, but with `krel` tooling.
same thing as the canary job, but with `krel` tooling.
1. The `no-bootstrap` job was running smoothly, so [updated to use it for the
canary job](https://github.com/kubernetes/test-infra/pull/20663).
canary job](https://github.com/kubernetes/test-infra/pull/20663).
1. Right before the update, we [switched to using buildx for multi-arch
images](https://github.com/kubernetes/kubernetes/pull/98529).
images](https://github.com/kubernetes/kubernetes/pull/98529).
1. Job started failing, which showed up in [some interesting
ways](https://kubernetes.slack.com/archives/C09QZ4DQB/p1612269558032700).
ways](https://kubernetes.slack.com/archives/C09QZ4DQB/p1612269558032700).
1. Triage begins! Issue
[opened](https://github.com/kubernetes/kubernetes/issues/98646) and release
management team is pinged in Slack.
[opened](https://github.com/kubernetes/kubernetes/issues/98646) and release
management team is pinged in Slack.
1. The `build-master`
[job](https://testgrid.k8s.io/sig-release-master-blocking#build-master) was
still passing though... interesting.
[job](https://testgrid.k8s.io/sig-release-master-blocking#build-master) was
still passing though... interesting.
1. Both are eventually calling `make release`, so environment must be different.
1. Let's look inside!
``` docker run -it --entrypoint /bin/bash
gcr.io/k8s-testimages/bootstrap:v20210130-12516b2 ```
```
docker run -it --entrypoint /bin/bash gcr.io/k8s-testimages/bootstrap:v20210130-12516b2
```
``` docker run -it
gcr.io/k8s-staging-releng/k8s-ci-builder:v20201128-v0.6.0-6-g6313f696-default
/bin/bash ```
```
docker run -it gcr.io/k8s-staging-releng/k8s-ci-builder:v20201128-v0.6.0-6-g6313f696-default /bin/bash
```
1. A few directions we could go here:
1. Update the `k8s-ci-builder` image to you use newer version of Docker
1. Update the `k8s-ci-builder` image to ensure that
`DOCKER_CLI_EXPERIMENTAL=enabled` is set
`DOCKER_CLI_EXPERIMENTAL=enabled` is set
1. Update the `release.sh` script to set `DOCKER_CLI_EXPERIMENTAL=enabled`
1. Making the `release.sh` script more flexible serves the community better
because it allows for building with more environments. Would also be good to
update the `k8s-ci-builder` image for this specific case as well.
because it allows for building with more environments. Would also be good to
update the `k8s-ci-builder` image for this specific case as well.
1. And we get a new
[failure](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-build-canary/1356704759045689344/build-log.txt)!
[failure](https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-build-canary/1356704759045689344/build-log.txt)!
1. Let's see what is going on in those images again...
1. Why would this cause an error in one but not the other if we have
`DOCKER_CLI_EXPERIMENTAL=enabled`?
([this](https://github.com/docker/buildx/pull/403) is why)
`DOCKER_CLI_EXPERIMENTAL=enabled`?
([this](https://github.com/docker/buildx/pull/403) is why)
1. In the mean time we went ahead and [re-enabled the bootstrap
job](https://github.com/kubernetes/test-infra/pull/20712) (consumers of those
images need them!)
job](https://github.com/kubernetes/test-infra/pull/20712) (consumers of those
images need them!)
1. Decided to [increase logging
verbosity](https://github.com/kubernetes/kubernetes/pull/98568) on failures to
see if that would give us a clue into what was going wrong (and to remove those
annoying `quiet currently not implemented` warnings).
verbosity](https://github.com/kubernetes/kubernetes/pull/98568) on failures
to see if that would give us a clue into what was going wrong (and to remove
those annoying `quiet currently not implemented` warnings).
1. Job turns green! But how?
1. [Buildx](https://github.com/docker/buildx) is versioned separately than
Docker itself. Turns out that the `--quiet` flag warning was [actually an
error](https://github.com/docker/buildx/pull/403) until `v0.5.1` of Buildx.
Docker itself. Turns out that the `--quiet` flag warning was [actually an
error](https://github.com/docker/buildx/pull/403) until `v0.5.1` of Buildx.
1. The `build-master` job was running with buildx `v0.5.1` while the `krel` job
was running with `v0.4.2`. This meant the quiet flag was causing an error in the
`krel` job, and removing it alleviated the error.
was running with `v0.4.2`. This meant the quiet flag was causing an error in
the `krel` job, and removing it alleviated the error.
1. Finished up by once again [removing the `bootstrap`
job](https://github.com/kubernetes/test-infra/pull/20731).
job](https://github.com/kubernetes/test-infra/pull/20731).
### Fixes
@ -133,8 +134,8 @@ Brand new to the project?
Setup already and interested in maintaining tests?
- Check out [this video](https://www.youtube.com/watch?v=Ewp8LNY_qTg) from
Jordan Liggit who describes strategies and tactics to deflake flaking tests
([Jordan's show notes for that
talk](https://gist.github.com/liggitt/6a3a2217fa5f846b52519acfc0ffece0))
([Jordan's show notes for that
talk](https://gist.github.com/liggitt/6a3a2217fa5f846b52519acfc0ffece0))
Here's how the CI Signal Team actively monitors CI during a release cycle:
- [A Tour of CI on the Kubernetes