Fixes code block formatting in guide for flake finders episode 000. Signed-off-by: hasheddan <georgedanielmangum@gmail.com> |
||
|---|---|---|
| .. | ||
| README.md | ||
README.md
Flake Finder Fridays #0
February 5th 2021 (Recording)
Hosts: Dan Mangum, Rob Kielty
Introduction
This is the first episode of Flake Finder Fridays with Dan Mangum and Rob Kielty.
On the first friday of every month we will go through an issue that was logged for a failing or flaking test on the Kubernetes project.
We will review the triage, root cause analysis, and problem resolution for a test related issue logged in the past four weeks.
We intend to demo how CI works on the Kubernetes project and also how we collaborate across teams to resolve test maintenance issues.
Issue This is the issue that we are going to look at today ...
[Failing Test] ci-kubernetes-build-canary does not understand "--platform"
Testgrid Dashboard
Breaking PRs
Investigation
-
Desire to move from Google-owned infrastructure to Kubernetes community infrastructure. Thus the introduction of a canary build job to test pushing building and pushing artifacts with new infrastructure.
-
Desire to move off of
bootstrap.pyjob (currently being used for canary job) tokreltooling. -
Separate job existed (
ci-kubernetes-build-no-bootstrap) that was doing the same thing as the canary job, but withkreltooling. -
The
no-bootstrapjob was running smoothly, so updated to use it for the canary job. -
Right before the update, we switched to using buildx for multi-arch images.
-
Job started failing, which showed up in some interesting ways.
-
Triage begins! Issue opened and release management team is pinged in Slack.
-
The
build-masterjob was still passing though... interesting. -
Both are eventually calling
make release, so environment must be different. -
Let's look inside!
docker run -it --entrypoint /bin/bash gcr.io/k8s-testimages/bootstrap:v20210130-12516b2docker run -it gcr.io/k8s-staging-releng/k8s-ci-builder:v20201128-v0.6.0-6-g6313f696-default /bin/bash -
A few directions we could go here:
- Update the
k8s-ci-builderimage to you use newer version of Docker - Update the
k8s-ci-builderimage to ensure thatDOCKER_CLI_EXPERIMENTAL=enabledis set - Update the
release.shscript to setDOCKER_CLI_EXPERIMENTAL=enabled
- Update the
-
Making the
release.shscript more flexible serves the community better because it allows for building with more environments. Would also be good to update thek8s-ci-builderimage for this specific case as well. -
And we get a new failure!
-
Let's see what is going on in those images again...
-
Why would this cause an error in one but not the other if we have
DOCKER_CLI_EXPERIMENTAL=enabled? (this is why) -
In the mean time we went ahead and re-enabled the bootstrap job (consumers of those images need them!)
-
Decided to increase logging verbosity on failures to see if that would give us a clue into what was going wrong (and to remove those annoying
quiet currently not implementedwarnings). -
Job turns green! But how?
-
Buildx is versioned separately than Docker itself. Turns out that the
--quietflag warning was actually an error untilv0.5.1of Buildx. -
The
build-masterjob was running with buildxv0.5.1while thekreljob was running withv0.4.2. This meant the quiet flag was causing an error in thekreljob, and removing it alleviated the error. -
Finished up by once again removing the
bootstrapjob.
Fixes
- Set DOCKER_CLI_EXPERIMENTAL=enabled for images using buildx
- Make image build logs verbose if necessary
Test Infra
- ci-kubernetes-build-canary: Migrate from bootstrap to krel
- releng: Re-enable a bootstrap build job for K8s Infra
- Revert "releng: Re-enable a bootstrap build job for K8s Infra"
Slack Threads
Helpful Links
Kubernetes Project Resources
Brand new to the project?
- Start here: https://www.kubernetes.dev/
Setup already and interested in maintaining tests?
- Check out this video from Jordan Liggit who describes strategies and tactics to deflake flaking tests (Jordan's show notes for that talk)
Here's how the CI Signal Team actively monitors CI during a release cycle: