98 KiB
December 16, 2019 (recording)
Host: Chris Short?
Note Taker: TBD
Attendees:
Agenda:
- FINISH 1.17 RETROSPECTIVE https://docs.google.com/document/d/1AtZ_81F3E4y_04Gx31mnG5w6AG3E0AXDOuLU7RcUIso/edit#bookmark=id.w6sgvtx220ar
December 2, 2019 (recording)
Host: Stephen Augustus
Note Taker: TBD
Attendees:
- Stephen Augustus
- Sascha Grunert
- Taylor Dolezal
- Marko Mudrinić
- Daniel Mangum
- Ace Eldeib
- Hannes Hoerl
- Jorge Alarcon
- Kenny Coleman
- Tim Pepper
Regrets:
Josh Berkus
Bob Killen
Recurring Topics [timebox to N min]:
- (if you have any… here are some suggestions)
- Welcome any new members or attendees
- Subproject updates
- Licensing (https://github.com/orgs/kubernetes/projects/31)
- Release Engineering (https://github.com/orgs/kubernetes/projects/30)
- Release Managers
- hyperkube
- Release Team (https://github.com/orgs/kubernetes/projects/29)
- Testgrid dashboard review: https://testgrid.k8s.io/sig-release
- Project board review: https://github.com/orgs/kubernetes/projects/23
Open Discussion [timebox to N min]:
- Krel Integration Testing
- Action Item: setup repo to use for integration tests
- Krel Unit Testing Framework
- Use frameworks that are used across Kubernetes so that common patterns can be copied / implemented
- ginkgo/gomega is used in other Kubernetes places
- Circle back to this at Release Engineering meeting next week
- Whitebox vs Blackbox Testing: https://github.com/kubernetes/release/pull/942#issuecomment-558707516
- https://docs.google.com/document/d/1M4670sP4PxDi1tRf1AkYGBDvahnKGZekrERneaBDVs0/edit#heading=h.ppwls1rczyo9
- Add your suggested topic here and move this to the line below [firstname lastname, email or @slackname]
November 4, 2019 (recording)
Host: Stephen Augustus
Note Taker: TBD
Attendees:
- Stephen Augustus
- Jorge Alarcon
- Taylor Dolezal
- Bart Smykla
- Josh Berkus
- Sascha Grunert
- Nikolaos Moraitis
- Daniel Mangum
- Savitha Raghunathan
- Tim Pepper
- Todd Sedano
Recurring Topics [timebox to N min]:
- (if you have any… here are some suggestions)
- Welcome any new members or attendees
- Subproject updates
- Licensing (https://github.com/orgs/kubernetes/projects/31)
- Release Engineering (https://github.com/orgs/kubernetes/projects/30)
- Release Managers
- hyperkube
- Release Team (https://github.com/orgs/kubernetes/projects/29)
- Testgrid dashboard review: https://testgrid.k8s.io/sig-release
- Project board review: https://github.com/orgs/kubernetes/projects/23
Open Discussion [timebox to N min]:
- Add your suggested topic here and move this to the line below [firstname lastname, email or @slackname]
- WG LTS discussions update [tpepper, jberkus]
- 12-month KEP
- Today we support a release for 9 months, but actually it’s 9mo’s plus 6-7weeks
- Discussion is 12 months, with formalized trailing period of 2 months with probably only CVEs during that period.
- Today one might upgrade in April, January, and October in order to keep aligned with supported patch releases. The proposal would allow one to pick a time period (eg: April) and always plan an annual major migration/upgrade at that time.
- What is the cost? Human and infra.
- What is the benefit? Survey showed people do want longer support. Would an additional 3 months help a significant portion of users? Survey data can be read multiple ways.
- How much are our community supported artifacts actually used?
- What portion of Kubernetes users get their bits from community versus a vendor?
- Vendor/distributors:
- Is there any support they expect from the community?
- What value do they see in our community releases and patch releases?
- Should the community make patch releases at all?
- Cherry-picking vs. accepting numbered patch release (e.g. Kernel)
- Would a vendor share this information with us? Is their code visible (eg: RedHat’s are, although Apache license does not require it) to see if they use community patch releases or manage their own cherry pick support branches.
- RedHat: likely no value in 3-month extension, because overall support is 4+ years.
- Pivotal: their customers would like this longer cycle
- CoreOS/Tectonic in the past would have benefited from this incremental breathing room
- Upgrade scenarios:
- Today are already fraught when doing in-place upgrades of cluster nodes across a large version skew.
- The KEP would make that worse.
- Community/industry seems to be moving toward preferring deploying new clusters at new versions, migrating the workload from an older cluster to a newer one. That is also non-trivial today.
- The KEP does not address any of this, but expects future work should be a focus so cluster operators can more easily deploy and roll over to newer versions.
- 12-month KEP
- Release Notes quality [tpepper]:
- Does https://github.com/kubernetes/community/blob/master/contributors/guide/release-notes.md sufficiently describe when to add a release note and how to format it for maximum information sharing.
- SIG Release owns collating a changelog. Are we responsible for the content too and if so how do we push 2-3000 people towards creating better release-note stanzas in their PRs.
- Josh: it describes formatting, but not WHEN to add a release note.
- Issue open about that
- https://github.com/kubernetes/community/issues/484
- Also need to describe what goes in the release note
- Suggestions:
- https://chris.beams.io/posts/git-commit/ in particular the sub-set of how to write a good one-line description short log summary
- “User-visible” changes: who is the user persona?
- Actionable, concise vs. detailed, thorough
- Take out the individuals’ feelings regarding what the submitter/reviewer feel is good, give concrete, objective criteria details on types of things that need release note and how the associated release note for that case should look
- Needs socialized everywhere...k/dev, contrib ex, community meeting, SIG leads, etc.
- Action: Sascha (and others on release notes team) to start looking at what could be done around https://github.com/kubernetes/community/issues/484 docs updates
- Add kind/deprecation label [Stephen]
- PR: https://github.com/kubernetes/test-infra/pull/15040
- How to keep required labels easy for contributors?
- Need reviewer/approvers enforcing expected quality
- Feels like we need a simplified review checklist for reviewer/approvers with all the required things.
- Action: Tim Pepper will look through devel guide and discuss with SIG Contrib Ex if this is a gap they’d be interested in driving.
- Bot should include a link to review checklist when tagging reviewers automatically
- How to socialize these expectations and build common culture around them? There’s no global “reviewer/approver” mailing list. For some reason people don’t seem to read k-dev list for announcements.
- PR Template suggestion for “primary SIG”: https://github.com/kubernetes/kubernetes/pull/82449 [Stephen]
- Proposes adding a label
- Issues/PRs today can have multiple SIGs listed. But who is the primary contact point?
- Original motivation was to clarify things in the release notes, where today many major changes will be marked with multiple SIGs, so which SIG’s summary stanza would the release note go in? A reader interested in networking topics today might need to read the SIG Networking section and a bunch of others. Solving this would need some additional level of categorization.
- Example: the submariner PRs are about multicluster networking. Should those be limited to SIG Multi-Clusterr or SIG Network?
- Group feels the benefit here is minor, and also wouldn’t make more clear the situation of multiple SIGs who are actually involved on a PR/issue. Effort/benefit tradeoff is small. Today there is a lot of manual work the team does to decide in which category to best put a release note or if to duplicate it and that would still be needed.
- Are people finding the markdown file or the release notes web site more useful?
- Survey results split.
- If the interactive site is useful, people can pick their individual preferred categorizations on the fly
- If we did not split on SIG in the markdown file, what would the best categorization be?
- Is splitting by SIG actually useful to people who aren’t closely familiar with project internals? “What is a SIG” remains our number one question. SIG Networking may be obvious, but what’s the difference between say SIG UI and SIG UX?
- Would splitting by Kind make sense? Or Priority (questionable value today, few use this label)? Or Area?
- Action: Sascha Grunert to demo what it might look like split on Kind.
- [justaugustus] Help me close some PRs!
- https://github.com/kubernetes/kubernetes/pull/84662
- https://github.com/kubernetes/sig-release/pull/839
- https://github.com/kubernetes/test-infra/pull/15040
- https://github.com/kubernetes/test-infra/pull/15083
- https://github.com/kubernetes/test-infra/pull/15068
- https://github.com/kubernetes/release/pull/921
- https://github.com/kubernetes/sig-release/pull/847
Oct. 21, 2019 (recording)
Host: Tim Pepper
Note Taker: TBD
Attendees:
- Tim Pepper
- Bart Smykla
- Sascha Grunert
- Jorge Alarcon
- Kenny Coleman
- Marko Mudrinić
- Daniel Mangum
- Stephen Augustus
- Todd Sedano
- Maria Ntalla
- Add your names here
Recurring Topics [timebox to N min]:
- Subproject updates
- Licensing (https://github.com/orgs/kubernetes/projects/31)
- Release Engineering (https://github.com/orgs/kubernetes/projects/30)
- Release Managers
- hyperkube
- Release Team (https://github.com/orgs/kubernetes/projects/29)
- Testgrid dashboard review: https://testgrid.k8s.io/sig-release
- Oct. 18/19 networking failure, possible fix in https://github.com/kubernetes/kubernetes/pull/84159
- Project board review: https://github.com/orgs/kubernetes/projects/23
- Test-infra freeze:
- idea disliked by SIG Testing, they don’t want to pause Prow development and it has other users
- We’re not looking for Prow to stop, simply the variation in Prow as used by just Kubernetes to slow down at times.
- https://github.com/kubernetes/test-infra/commits/master/config/prow/config.yaml bot does a daily update of the prow commit in usage by k8s project. What is the quality gate ahead of that bot updating? If there’s good initial gating, good test coverage, signal triage and fix, perhaps a freeze is not necessary.
- We’re asking for folks to be more deliberate about when and where new things are rolled out so we’re not solely testing in production and reacting to issues.
- Test-infra freeze:
Open Discussion [timebox to N min]:
- Add your suggested topic here and move this to the line below [firstname lastname, email or @slackname]
Oct. 7, 2019 (recording)
Host: TBD
Note Taker: TBD
Attendees:
- Tim Pepper
- Carlos Panato
- Daniel Mangum
- Dims
- Ihor Dvoretskyi
- Jeremy Rickard
- Max Körbächer
- Lubomir Ivanov
- Marko Mudrinic
- Nikolaos Moraitis
- Savitha Raghunathan
- Seth McCombs
- Stephen Augustus
- Jorge Alarcon
- Eddie Villalba
- Nabarun Pal
- Marky Jackson
- Bob Killen
- Jordan Liggitt
- Linus Arver (@listx, Google)
- Adam Kaplan (@adambkaplan, Red Hat)
- Bart Smykla
Recurring Topics [timebox to N min]:
- (if you have any… here are some suggestions)
- Welcome any new members or attendees
- Subproject updates
- Licensing (https://github.com/orgs/kubernetes/projects/31)
- Release Engineering (https://github.com/orgs/kubernetes/projects/30)
- Release Managers
- 1.13.12, 1.14.8, 1.15.5, and 1.16.3 scheduled for Tues. Oct. 15.
- hyperkube
- Release Managers
- Release Team (https://github.com/orgs/kubernetes/projects/29)
- Testgrid dashboard review: https://testgrid.k8s.io/sig-release
- Project board review: https://github.com/orgs/kubernetes/projects/23
Open Discussion [timebox to N min]:
- Should artifacts published by publishing-bot be release-blocking? [Aaron, Nikhita]
- Came from 1.16 retrospective
- Reach out to Aaron/Nikhita asynchronously regarding this in https://github.com/kubernetes/sig-release/issues/806
- Publish instructions on how to consume alpha, beta, rc artifacts today. We could write up a TL;DR.
- Stephen did an informal poll on Twitter. Lots of people didn’t know we make alpha/beta/rc releases today.
- Need to define the artifact list.
- There are things we can’t do today without release channels, but it will get better once we have release channels for daily, testing, prod builds.
- Doc on kubeadm w/ pre-releases: https://github.com/kubernetes/kubeadm/blob/master/docs/testing-pre-releases.md
- Buckets
- Carlos to work with Marky on one-pager summarizing basic workflow of getting/running alpha/beta/rc, aim for PR’ing into contributor or developer guide.
- Definitions of “priority/**”:
- Multiple places with a definition today:
- k/* on GitHub have labels defined at: https://github.com/kubernetes/test-infra/blob/master/label_sync/labels.yaml#L362
- priority/awaiting-more-evidence: not used much at all
- priority/backlog: we thought it was used minimally, but…
- priority/critical-urgent: not too hard for issue triage on release team to manage this volume of issues:
- priority/important-longterm
- priority/important-soon
- needs-priority: frequently used...
- 1.16 PRs merged w/o priority label: 259
- 1.16 PRs merged w/ priority label: 692
- k/community:
- k/test-infra:
- label_sync/labels.yaml leads to mousehover tooltip on GitHub
- https://github.com/kubernetes/test-infra/blob/master/label_sync/labels.md generated from the yaml
- k/sig-release:
- Example search on hound: https://cs.k8s.io/?q=priority%2Fcritical-urgent&i=nope&files=&repos=
- k/* on GitHub have labels defined at: https://github.com/kubernetes/test-infra/blob/master/label_sync/labels.yaml#L362
- We’ve conflated severity of issue and urgency in fixing issue and also indicating activity on resolving issue (lifecycle/* labels)
- Multiple use cases:
- Release team uses them for priorities
- Patch release team uses them for assessing urgency, but only minimally and practically ignores them. They’re a very low quality issue.
- Individual SIGs use them for triaging and stack ranking today’s work relative to a milestone.
- Implementation varies from SIG to SIG and contributor to contributor...do we need to drive out of SIG Release a cross-project KEP for better contributor experience?
- Release team struggles to answer is a given item “priority/important-soon” really important? Or backlog (which should be covered by “priority/important longterm”
- May mean something different in master branch versus patch release branches
- Is priority varying based on time-to-release?
- how are items promoted or demoted in priority in general?
- how are items promoted or demoted from current milestone to no milestone or to next milestone? [Marko Mudrinić, Niko Pen, Stephen Augustus, Maria Ntalla]
- What are we trying to solve if we change these?
- Maybe nothing specifically and maybe not actually change anything...just focus on making the multiple places of documentation consistent.
- Volunteer to clean the existing documentation to be consistent? Make one document describing what we have today. Link it from the other places that currently describe things in different ways. @alejandrox1 will look into this.
- Multiple places with a definition today:
- Discuss stability releases [Stephen]
- Tracking issue: https://github.com/kubernetes/sig-release/issues/809
- What is stability?
- Need to improve the CI signal
- KEPs are supposed to come with test cases. If they include a couple links to the testgrid place that show their green-ness or red-ness, that would be hugely valuable to the release team. Most KEPs start out with just an intent noted for adding tests. Could pick a tag name for the e2e test and pre-compose a (initially broken) link to the test grid that will show their healthy test eventually.
- Update KEP template. Add feedback here: https://github.com/kubernetes/enhancements/issues/822
- GitHub issue for Retro AIs (action items): https://github.com/kubernetes/sig-release/issues/806
- Add your suggested topic here and move this to the line below [firstname lastname, email or @slackname]
disSeptember 23, 2019 ([recording](https://youtu.be/K2CZAxfeVIU))
Host: Stephen Augustus
Note Taker: TBD
Attendees:
- Add your names here
- Todd Sedano
- Savitha Raghunathan
- Bart Smykla
- Marko Mudrinić
- Jorge Alarcon
- Bob Killen
- Craig Peters
- Jeffrey Sica
- Nikolaos Moraitis
- Tim Pepper
New topic proposals:
- (Craig Peters) 1.16 Release Retro, part two:
- See retro doc for full details: https://bit.ly/k8s116-retro
September 4, 2019 ([recording](https://youtu.be/uztvn3NJu5U))
Host: Stephen Augustus
Note Taker: Guinevere Saenger
Attendees:
- Add your names here
- Josh Berkus
- Hannes Hörl
- Jorge Alarcon
- Marko Mudrinić
- Nicholas Lane
- Lachlan Evenson
- Guinevere Saenger
- Yang Li
- Jim Angel
- Josiah Bjorgaard
- Tim Pepper
Open Discussion [timebox to N min]:
- Blocking/Informing Jobs discussion and documentation
- How do we set criteria for flakiness?
- Tracking issue: https://github.com/kubernetes/sig-release/issues/773
- “Does the test job reliably pass when it ought to pass” => this is not something that we can reliably check. We need to come up with a new criterion that can be used by our infra so we can instantly see.
- Tie this to existing test-infra UI: e.g. must not be on the flaky test board. TODO: everyone go comment on the above-linked issue
- What's the procedure for adding new jobs to -informing and documenting them? What about Blocking?
- Tracking issue: https://github.com/kubernetes/sig-release/issues/774
- we do not have a good process to determine which jobs should get added. This is an issue in the 1.16 release with the new windows jobs. We have neither process nor documentation; past process has been rather ad-hoc
- one suggestion is to pass job responsibility to the SIGs to their individual testgrid boards; this will need cooperation from SIGs but SIGs may feel more responsible and hence faster to react.
- concern: would this mean more boards for CI Signal to watch?
- Jorge’s comment: nothing should change for CI signal. SIGs will communicate with release team / SIG Release about what goes on their dashboards.
- we want SIGs to be able to say “these jobs are absolutely necessary to pass” vs “these jobs are under construction”
- we have release-informing and release blocking jobs already
- we should focus on reducing our burden but not create process for other SIGs
- Jorge’s comment: We propose to ask SIGs to look at their dashboards in a similar manner as CI sigal does for better communication.
- we do want a way to be able to communicate back and forth between CI Signal and the SIGs’ jobs
- CI Signal has been reacting to Jobs that do not alert SIGs, which is a misalignment/miscommunication.
- TODO: Please send these as comments on the above-linked issue.
- concern: would this mean more boards for CI Signal to watch?
- recently a SIG-cli job was merged but did not have a SIG-cli alert on it: make sure the correct people are responsible and aware
- master-informing, after discussion with SIG-Release, can be done fairly easily. master-blocking should be much more strict and definitely be approved by SIG-Release; perhaps a specific issue template that outlines criteria and check boxes. That way it’s under SIG-Release OWNERS file.
- We also need test-infra to have some enforcement so a test can’t just be added via a kubernetes annotation on the Job.
- Prerequisites:
- CI Signal Lead should need to sign off (coordinate with rel team lead)
- SIG-Release chairs also needs to sign off
- How should the Release Team decide that Informing failures are tolerated/ignorable?
- this is more of a release team process issue; we should document this.
- THANK YOU to the CI-Signal team and Josh for putting all this work into it so we can work on it!
- How do we set criteria for flakiness?
- What would need to be true so that we are confident to remove the blockage on k/release? [Hannes Hoerl, hhorl@pivotal.io @hoegaarden]
- Issue: https://github.com/kubernetes/release/issues/816
- this surfaced during cleanup of the push-build script
- there’s a few ci-signal build jobs that push k8s up to gcp and they got broken, bringing down kubernetes
- blockade exists to prevent this from happening
- what does it take to prevent this from happening: we need to feel comfortable making safe changes; one idea is to configure jobs to a good tag of kubernetes
- we should sync k/release with this
- for discussion in next meeting: what can we do with tags and branches?
- WG LTS requesting feedback on draft KEP:
- Staffing impact of supporting an additional release branch (ie: 3-4 additional months)
- Patch release inputs: Minor costs for 4 instead of 3 branches for cherry picks since most picks land on all branches in practice today (have data on hundreds of patches showing this)
- Patch release outputs: building is straight forward, mostly automated, minor additional cost. Packaging is currently a long pole (depends on small set of Googlers still), so becomes a longer pole even...could consume about a full day of one person’s time to get out a set of 4 branches’ patch release packages.
- Incremental cost of CI on an additional release branch (also asking SIG Testing) for a cycle (ie: 3-4 additional months)
- Unknown, would need numbers from Google or guestimate / mock
- Formalization of process for removing “oldest” branch’s CI (eg: a “Remove-CIPresubmit-jobs” section like https://github.com/kubernetes/sig-release/blob/master/release-engineering/role-handbooks/branch-manager.md#Create-CIPresubmit-jobs).
- Has moved out of release team into branch manager role, effectively an arbitrary date today (when beta is cut and new branch is created) but could be a more reasoned date that is further out in time (ie: +2 months)
- Better definition of:
- minimum period after new release N where the oldest release can still accept critical fixes. Practically this is ~1 month today, this proposal suggests 2 months.
- policy for patch merges during this special 2 month window, eg: critical security fixes (ie: CVE assigned by Product Security Committee and cherry pick of fixes initiated to release branch by Product Security Committee).
- Formalize 3rd party dependency update policy (eg: golang, but also others too)
- Patch release updates: Need to track upstream golang patch releases continually.
- Major release updates: Can we also formalize when to move to a new Golang major release. This is needed as two of our annual releases today (and all of them if we move to a yearly lifecycle) run beyond the end of Golang’s patch release lifecycle.
- Suggestion from Stephen Augustus: maybe go for 15 months
- Concerns on additional burdens:
- staffing impact for additional release branch (patch release team)
- vast majority of cherry picks go onto all branches
- we need to chip away at removing the Google curtain: updates to apt and yum
- Stability concerns for APIs and migration issues
- This would be a policy change with a large group of stakeholders
- Discussion to continue at kubecon / contributor summit (WG LTS has proposed a session for the contrib summit; if not accepted on the fixed schedule, it will also be proposed there at the un-conference)
- Staffing impact of supporting an additional release branch (ie: 3-4 additional months)
- KubeCon NA 2019 SIG Release Intro session proposed:
- KubeCon NA 2019 SIG Release Deep Dive session proposed:
- Contributor Summit (day before KubeCon) session proposals:
- Due by Sept. 9 - verify deadline(?)
- ...what do we have needing f2f discussion?
- Alternatively can propose un-conference on the spot there
Recurring Topics [timebox to N min]:
- (if you have any… here are some suggestions)
- Welcome any new members or attendees
- Subproject updates
- Licensing (https://github.com/orgs/kubernetes/projects/31)
- Release Engineering (https://github.com/orgs/kubernetes/projects/30)
- Release Managers
- hyperkube
- Release Team (https://github.com/orgs/kubernetes/projects/29)
- Testgrid dashboard review: https://testgrid.k8s.io/sig-release
- Project board review: https://github.com/orgs/kubernetes/projects/23
August 12, 2019 ([recording](https://www.youtube.com/watch?v=HycpkheG7hU&list=PL69nYSiGNLP3QKkOsDsO6A0Y1rhgP84iZ&index=12&t=0s))
Host: Tim Pepper
Note Taker: Josh Berkus
Attendees:
- Bob Killen (Enhancements Shadow)
- Hannes Hoerl (patchReleaseTeam)
- Nikhita Raghunath
- Kenny Coleman (enhancement lead)
- Jeffrey Sica (release team)
- Sascha Grunert (release team)
- Josh Berkus (release team, etc.)
- Tommy Chong
- Claire Laurence
- Savitha Raghunathan
- Bart Smykla
- Angela Lewis (New attendee,yay!)
- Carlos Panato (Release Manager Associate)
- Imran Pochi (Communications Team shadow)
- Ace Eldeib (Release Manager Associate)
Recurring Topics [timebox to N min]:
- (if you have any… here are some suggestions)
- Welcome any new members or attendees
- Subproject updates
- Hyperkube (probably won't be doing status in the future)
- Licensing
- really need to have meetings; having trouble finding a time
- Josh votes to have licensing status updates in release meetings
- Publishing-bot (after today will report status through release engineering status bucket)
- Currently removing 1.12 stuff, and adding 1.16: https://github.com/kubernetes/kubernetes/pull/81287
- Moved to CNCF infrastructure
- Bot is happy
- Release Engineering
- First subproject meeting was last week: agenda/minutes doc is: https://docs.google.com/document/d/16GqCjnEh86w8yADcrUylNoE1y1sqjIMYNC_gdk5WPSQ/edit#
- Been working on some high-level ideas
- Now working on implementing a lot of those; testing them in the 1.16 cycle.
- First -- continuing to move stuff to places where non-Googlers can access them
- Second -- publish to CNCF infra
- Just publishing one Deb repo and one Yum repo, with all k8s versions in one spot. Need to switch to one repo per k8s version.
- Then can also split each of those into nightly, testing (eg: alpha/beta/RC), stable with a promotion flow for artifacts.
- Do the CICD pipeline thing
- Release Team
- 1.12 jobs leaving test-infra
- 1.16 jobs to start with branch creation this week
- A few enhancement exceptions are coming through
- 1.16 branch cut this week
- Next week burndown begins
- Code freeze at end of month
- CI is reddish but we should be cleaning it up
- Testgrid dashboard review (https://testgrid.k8s.io/sig-release)
- Went over test boards
- sig-release-misc: green
- sig-release-orphaned
- currently all upgrade/downgrade tests are orphaned, except Kind
- Kind test is release informing (see https://github.com/kubernetes/kubeadm/issues/1599 regarding moving to blocking)
- need to have upgrade/downgrade tests that we trust
- some SIG-Cluster-Lifecycle tests might be OK to promote to at least informing for 1.16 (and blocking for 1.17?) -- Josh to follow-up
- SIG-Cluster-Lifecycle will not do downgrade coverage
- Hopefully cluster API and kubeadm work enable us to also get signal on providers beyond GCE (and Kind)
- Project board review
Open Discussion [timebox to N min]:
- n/a
July 30, 2019 ([https://youtu.be/I38zmgbpdwE](https://youtu.be/I38zmgbpdwE))
Host: Stephen Augustus
Note Taker: Tim Pepper
Attendees:
- Stephen Augustus
- Lachlan Evenson
- Mike Crute
- Josh Berkus
- Tim Pepper
- Caleb Miles
- Aaron Crickenberger
- Ben Elder
- Dhawal Yogesh Bhanushali
- Kendrick Coleman
- Khosrow Moossavi
- Michael Singh
- Savitha Raghunathan
- Khosrow Moossavi
New agenda items
- Dhawal Yogesh Bhanushali: KEP draft has started in WG LTS to propose changes to K8s release support lifecycle and upgrade path.
- https://docs.google.com/document/d/1uOQkN_B4SDepvPGEJ0FX89Q589zPqSItpWVAvdvELMU/edit#heading=h.ul9lgso7jzs6
- Proposes move from 9-ish months support on each of 3 releases to 12-ish months support on each of 4 releases
- Policy has been specific around 9 months. But “-ish” on 9 and 12 months because today the patch release team actually is giving more like 10-11 months on the trailing most release instead of just 9. In aiming to give a year of support it could similarly be one year plus a bit of overlap.
- There’s been discussion of more explicitly allowing critical CVE fixes after the 9mo’s. Don’t have data to show that’s ever actually happened though.
- Today technically we could support an older release branch relatively easily and with some confidence so long as CI is still available on the branch. Once CI goes away it’s much riskier.
- This is not a large delta on what is done today, but is surely more palatable if more diverse companies are more active in the patch release team showing that the trailing support is sustainable
- N-2 upgrades and deprecation cycles:
- Multiple folks express more skepticism here in that this is a more radical shift in what is done today and would require much more buy-in broadly in the community (Steering, Architecture, etc.)
- Why is this a blocking pre-requisite on the other aspect of expanding the support timeline a few months? Dhawal suggests this is intended to be complementary, because an end user who is lagging behind struggles to upgrade.
- [spiffxp] Upgrade being hard upstream means it’s also hard downstream.
- [Caleb] But downstream productization teams, if they are having notable friction on upgrade, should be staffing reducing that friction. If they’re looking for easier qualification, why don’t we see them contributing to easier qualification in the community?
- [Ben] it’s not just people costs where companies need to show up and contribute, but also just on moving to a longer lifecycle does incur real infra dollars costs
- [jberkus] we don’t have confidence in our N-1 tests today and that’ a pressing need to resolve before expanding it to N-2.
- Stephen Augustus: Doodle for changing SIG Release meeting time to be more internationally friendly, also switching off weeks with Release Engineering Subproject.
- Maybe 10am Pacific: SIG Release one week and then the other week Release Engineering Subproject.
- SIG Testing has just decided to move to that slot.
- A doodle should be in folks inboxes later today.
- Notice sent to MLs: https://groups.google.com/d/topic/kubernetes-sig-release/jw2UV-fKjEY/discussion
- Tracking issue: https://github.com/kubernetes/sig-release/issues/411
- (if time) Aaron Crickenberger: moving sigs.k8s.io/kind CI into release-blocking
- Kind is cool
- Kind is quick
- Kind is reliable on conformance tests
- Especially for IPv6 gives first ability for A/B network test on k8s versus underlying network infra
- Kinder’s doing some upgrade testing in release-informing board.
- Kind could come into pre-submits in the near term maybe.
- It is finding real bugs today.
- Group’s liking the proposal...make it so.
Standing agenda items
- Subproject status
- Hyperkube: should this code just be clumped under release-engineering?
- Do we need to be tracking its status? It’s owned by SIG Release. Connected to rel eng work would more people have awareness of its state and health and grow to be contributors on it when needed?
- Hyperkube is used for Azure and Talos and a couple of things, and doesn't need much/any actual maintenance
- Licensing
- Nikhita’s taking point on this with help from Dims and Steve Winslow from LF
- It is 3am her time...need a meeting she can reasonably attend
- Publishing-bot: should this code just be clumped under release-engineering?
- This will go away eventually once the monolith splits, but this is big and complicated.
- This subproject is staffed reasonably sustainably. Need a meeting Nikhita can attend.
- Is it really part of the release process, versus an artifact of code organization?
- Release-engineering
- OWNERS for k/k: https://github.com/kubernetes/kubernetes/blob/master/OWNERS_ALIASES#L125-L140
- beta/stable1/stable2/stable3: confusion remains. Do we need a more visible location for this info vs https://github.com/kubernetes/test-infra/blob/master/docs/kubernetes-versions.md (recently moved out of the top level test-infra/README.md)
- Release Team
- Hyperkube: should this code just be clumped under release-engineering?
Issue backlog scrub (https://github.com/kubernetes/sig-release/issues)
- SIG Release project board: https://github.com/orgs/kubernetes/projects/23
- Release Engineering project board: https://github.com/orgs/kubernetes/projects/30
- Release Team project board: https://github.com/orgs/kubernetes/projects/29
- Licensing project board: https://github.com/orgs/kubernetes/projects/31
Date: | July 16, 2019 |
Attending: |
|
Note taker: |
|
[BenTheElder] kind as release blocking- Particularly IPv6
- Be will open issue in k/sig-release
- (see also below in notes discussion of release blocking criteria)
- [Stephen Augustus] Migrate orphaned release jobs
- https://github.com/kubernetes/test-infra/pull/13468
- No owner of test means not a release blocking test.
- [Stephen Augustus] Blockade, release related PR merge status
- A bunch of small improvement patches merged and it broke stuff
- Turns out small changes to k/release magically can break blocking CI!!!
- Patches reverted.
- Issue: https://github.com/kubernetes/release/issues/816
- Prow has path-based regex for blocking patches. Until we get better on test coverage, we need to have clear SIG Release / releng subproject review of patches to k/release that might cause an issue. Prow will add “do-not-merge/blocked-paths” and block PR merge until that label is removed.
- k/release repo needs tagging so it is clear which code was used to build which k8s releases, or what are known good points in the repo
- CI can be pointed at these tags, instead of possibly unsafe intermediate states
- [Josh Berkus] should we have a blockade on k/release during code freeze?
- The more of k/release that is teased into k/k/build, the more that is implicitly versioned and frozen/thawed in sync with k/k.
- More on that release engineering cleanup brainstorm in https://docs.google.com/document/d/1Js_36K51Q6AjEsVUjRBMISTA4D7cnjZmoSkn43_Jmxo/edit#heading=h.y81m7zlfl31g
- [Ben Elder] push-build is a key tool that comes from k/release
- Release Engineering Subproject:
- What are we building/releasing? Bart Smykla did a deep dive into making a manifest document of all our artifacts:
- https://github.com/kubernetes/sig-release/blob/master/release-engineering/artifacts.md
- The better we have this documented, the easier it is to make a change and show the change still outputs the same outputs as the prior tooling (and write validators/tests for each output)
- WG K8s Infra gave us GCP projects for PoC test building and “releasing”
- “Omni” KEP and GH issues in the works to spell out tasks to improve the build/release pipe and tools
- What are we building/releasing? Bart Smykla did a deep dive into making a manifest document of all our artifacts:
- [Josh Berkus] Implementing blocking criteria
- This has gone from proposal to effectively accepted policy, but...is informal
- Time to accept, etc? What more needs to be done?
- Aside from CI-Signal handbook, where would docs on this go?
- https://github.com/kubernetes/community/tree/master/contributors/devel/sig-release
- Josh will write a document and PR it
- Misc:
- Paul Bouwer: is new 1.16 release team shadow member helping with release notes
- Release Team happenings:
- “Branch manager” role is a part of “release managers” group now.
- The idea is we build a peer team relationship between the next major release’s branch management and the patch release streams’ branch management. The part release team is embargoed and deals with security patches.
- We need a pipeline of volunteers who have demonstrated their ability and trustworthiness, eg: branch management during a release. These are “associates”: https://github.com/kubernetes/sig-release/blob/master/release-managers.md#associates
- Need to do more formalization on what are the time bounds in apprenticing, what the path / ladder looks like in detail, what is the next step a person grows up into
- “Branch manager” role is a part of “release managers” group now.
Date: | July 2, 2019 |
Attending: |
|
Note taker: |
|
1.15 Retro:
-
Circle back to making sure discussion points are documented in issues/PRs with owners
-
Pick up with remainder of retro: https://docs.google.com/document/d/1Re80f4qICEKLEhOEIFuzr0IZTp2es82urMi45PrScLM/edit#bookmark=kix.occneu8jg1e5
SIG Scalability and Release interactions:
-
Informing tests that are going to block need to actually be on the blocking board. This means those tests need to meet the criteria for the blocking board.
-
Scale tests need to be more easy to spin up on an ephemeral context to validate some suspicion.
-
Major/large scale tests likely need to happen not just on master
-
SIG Scalability wants revert priv’s ; SIG Release needs a stable release with scale quality. Some other SIG(s) want a feature or fix or enhancement in the code base. A simple revert doesn’t satisfy more than 2 of these 3 stakeholders. Who owns driving things to a place where all 3 are satisfied?
-
SIG Scalability has indicated there are changes coming to help, but no record of what this is can be found in their meeting agendas (one in 2019?) or youtube recordings (none since 2017).
Date: | June 18, 2019 |
Attending: |
|
Note taker: |
|
[Stephen] 1.15 Release Delay- https://github.com/kubernetes/kubernetes/issues/79096
- Waiting on SIG Scalability, should hear back tomorrow morning
- Proposal to remove test infra role from release team
- See https://github.com/kubernetes/sig-release/issues/631
- Current role description: https://git.k8s.io/sig-release/release-team/role-handbooks/test-infra
- Item to be discussed in 1.15 retrospective https://docs.google.com/document/d/1Re80f4qICEKLEhOEIFuzr0IZTp2es82urMi45PrScLM/edit
- Any concerns from SIG Release?
- Stephen is +1
- Tim Pepper worried that we might be transferring responsibility to a smaller set of people
- Aaron will do everything
- 1.15 retrospective (https://docs.google.com/document/d/1Re80f4qICEKLEhOEIFuzr0IZTp2es82urMi45PrScLM/edit) tentatively scheduled for June 20 in Community Meeting http://bit.ly/k8scommunity.
- Broken over two meetings
- first the community meeting (this thursday)
- Second in sig-release
- Tim: brought up discussion of whether to have it not
- Delay of 1.15 as a retro item
- sig-scalability gating to the release discussion as a separate discussion (is happening in thread here: https://groups.google.com/d/topic/kubernetes-sig-release/Qz7sVbMYu2c/discussion )
- Broken over two meetings
- SIG Scalability issues in general
- Tests split to release informing: their tests don’t formally meet the criteria for release blocking.
- But….if SIG Scalability can block on some context, how do we get that represented in a blocking test case?
- Where / how (beyond the 1.15 retro) can we SIG Release have an active discussion with SIG Scalability. They don’t seem to have met but once this year: https://docs.google.com/document/d/1hEpf25qifVWztaeZPFmjNiJvPo-5JX1z0LSvvVY5G2g/edit
- [jhoak]: https://github.com/GoogleCloudPlatform/k8s-cluster-bundle as a method for packaging Kubernetes manifests/yamls and CRDs.
- KubeCon EU 2019 presentation: https://www.youtube.com/watch?v=6D5JMFqlov4
- Similar to, inspired by kustomize. Based on custom resources. Includes a cli.
- Intriguing patterns around:
- decoupling build from packaging and following that with qualification and promotion/release
- Machine readable output of total dependency set
- Output to channels (rapid, regular, stable)
- [jsica] relnotes.k8s.io -- dot-release data population?
- 1.15.1: just run the normal patch release process with the existing rel notes generator
- 1.15.x: we can iteratively prove out an alternative rel notes generator and shift if/when we’re happy
- Container Image Promoter
- Via Slack Linus Arver says [14:05] “doh, i can’t make today’s meeting. but, i do have an update: i will be working on the e2e testing for the container image promoter for the rest of the month” (in conjunction with Amy Chen)
- #wg-k8s-infra slack channel is where to contribute if you’re interested
- Primary current gap on e2e test is how do we prove and insure that if somehow this tool went wrong could it in-advertantly delete everything we’ve ever published? (one option is to not delete things)
- Is it possible for this to become a generate tool that can pull and promote arbitrary artifacts?
- Release Engineering subproject meeting and regular backlog burndown to start once 1.15 is done
- Project board: https://github.com/orgs/kubernetes/projects/30
- Licensing subproject same as ^^^
- Project board: https://github.com/orgs/kubernetes/projects/31
- Draft Exit Questionnaire for Release Team Mentoring [jberkus]
- [Lachie] 1.16 Things
- Proposed schedule: https://github.com/kubernetes/sig-release/pull/678
- Shadow application closing out THIS Friday. Let’s plug it! - Survey
Date: | June 4, 2019 |
Attending: |
|
Note taker: |
|
- Release Section Lead policy on doing a position a 2nd time [Jberkus]
- Should we have a policy of encouraging turnover?
- A couple current release leads are interested in staying in their role into the next cycle.
- All roles currently have shadows and it appears all of them could step up into the lead role.
- Multiple voices expressing that turnover is good, growing the pool of skills.
- Not letting a shadow step up is a negative.
- Option still for current lead to contribute on “emeritus” work or spin out onto sub-projects with their experience from being lead. Don’t need a title to do good things.
- Possibly limit lead to two terms.
- Shadow could also desire to stay as shadow for an additional cycle.
- Josh Berkus will PR some sample text into the selection process document...something light weight, pragmatic, balancing leading/shadowing, continuity/turnover.
- 1.16 Release Team [Claire]
- Assembling next team is underway
- Lachlan Evenson nominated/accepted to lead 1.16 release team
- https://github.com/kubernetes/sig-release/issues/648#issuecomment-498467764
- Expect activity in coming week.
- Shadow Questionnaire [jberkus]
- 1.16 questionnaire revisions
- Exit questionnaire for leads & shadows? Goal of shadow process and emeritus lead’s involvement there is to actually grow contributors and contributions. How are we measuring that at the end of a cycle? Folks agree a feedback loop is important.
- Emeritus Advisor -- continue to 1.16? [jberkus]
- Josh feels this needs done for at least one more cycle. He’s happy to do it, but also happy to share the work tightening the feedback loop.
- Beyond 1.16 should consider whether this needs to be a permanent role.
- Release Notes Website [jeefy]
- relnotes.k8s.io (Hooray) it’s alive, with 1.14 content as sample
- Draft 1.15 notes should be in by end of week
- Should we rip off the band-aid and prefer that site over the markdown?
- How much work to maintain both?
- Releasenotes drafting of verbage would need duplicated.
- Otherwise not too hard.
- Jeff willing to do both in parallel.
- [dims] would there still be a single file somewhere that can be used and viewed and presented offline? [jeff] The design intent was to take the huge list of changes out of the changelog and put them in a tool that makes them searchable on dynamic criteria.
- Human readable summary in static changelog is still desirable on an ongoing basis.
- Relnotes.k8s.io currently shows the latest release. Needs a tweak for permalinking so that the 1.15 changelog links to relnotes.k8s.io 1.15 content. This also gives permalinks to queries and results in the tool. People do want to share a view they’ve found in the data.
- How much work to maintain both?
- Let’s dooooo it! (Pending Claire’s approval) community
- Publishing version requirements (etcd, cni, coreDNS..) for kubernetes releases [yastij]
- https://storage.googleapis.com/kubernetes-release/release/stable-1.14.txt has our current version and in a machine readable form
- https://github.com/kubernetes/kubernetes/blob/release-1.14/CHANGELOG-1.14.md#external-dependencies has dependencies in a human generated form.
- Umbrella issue: https://github.com/kubernetes/sig-release/issues/601
- The set of things published, MUST be the set of things tested. Need the tests to depend on a machine readable form, then that can also be published.
- What automation can track bumps of dep’s? Watch the many known places we know dep’s are stated, eg: “verify_external_deps.sh” that does regexp parsing to create a current components yaml file. Mismatch can cause this verify to fail and warn them to also update some other central file. Over time shift preferring the central file as the single source of truth.
- Triage Workflow improvements [nikopen]
- Eventual removal of Triage role after the following Epic is implemented:
- https://github.com/kubernetes/community/issues/3456
- https://github.com/kubernetes/test-infra/pull/11818
- Git workflows on our repos [tpepper]
- issue is that repo admins can edit files directly on the repo
- educate everyone to NOT do this
Date: | May 7, 2019 |
Recording | |
Attending: |
|
- [justaugustus] SIG Release project boards
- SIG Release: https://github.com/orgs/kubernetes/projects/23
- Release Team: https://github.com/orgs/kubernetes/projects/29
- RelEng: https://github.com/orgs/kubernetes/projects/30
- Licensing: https://github.com/orgs/kubernetes/projects/31
- [Niko Pen] Triage Workflow improvements
Date: | April 23, 2019 |
Recording | https://youtu.be/s24kIs1RF_4 |
Attending: |
|
- [Claire Laurence] if needed, insure release team group has had a time/place to discuss the changes proposed from 1.14 retrospective:
- [claurence] Release Ecosystem Projects that have dependencies on the Release Cycle
- Earlier today SIG PM had some discussion on this
- Release lead handbook indicates lead should sync up with other projects / dependencies to coordinate ahead of release. But aside from kubeadm there’s not much specified.
- In-tree cloud providers could be another place coordination might be required.
- What does “ecosystem” mean?
- cAdvisor?
- etcd?
- Kubernetes-cni (and the others? all the CSI and CRI providers?)
- Kubeadm: was bound to pull the latest existing stable label, but it does not exist yet. That was subseque
- AI: SIG Release leads to investigate tracking dependencies
- Draft a KEP: we need a machine readable, structured, single source of truth, broad OWNERS set. Code that needs to get “etcd” should get the version specified in this file. Release notes should use this file. A PR changing a dependency in this file should get a special label and release notation and special review.
- [justaugustus] Tracking out-of-tree enhancements
- https://docs.google.com/presentation/d/1L0yM5t1e_61tUi1bu_8swnStznocjqka-aQvMIp1OpM/edit#slide=id.g4a3f70faaf_0_107
- Could use a “one sheet” howto on doing KEP work. A final step in that would include a reminder to PR against the KEP a line referencing the PR implementing history section of the KEP.
- Release team tracking enhancements needs a steel thread to understand that intended work is on track, see the PRs merged, see they have test code, see the test is green.
- SIG PM’s had some change in leads and folks unblocked to do more, expect some more roadmap on improvement plans for process (and possible also tools).
- [justaugustus] How should we handle stale enhancements?
- 1.15 enhancement collection found 130 open issues in initially. It’s down to about 36 tracked issues. What are the other hundred issues? Kenny’s commented on those asking if the enhancement would target the current milestone. A small percentage responded no, and about 80% non-response rate.
- Understanding graduation (and ejection?): if an enhancement is proposed, comes in at alpha and sits there with no meaningful progress, what is the next action?
- There is a deprecation policy, but not exactly ejection/deprecation criteria for languished alpha’s. Should enhancement tracking process in Release Team include a nudge to owning SIGs to consider removing code? Also SIG Arch / PM / Testing / Docs.
- All alpha’s should have feature gates (features.go), this is a place we can look at whether there are ones stalled out.
- Unified graduation criteria does not exist.
- It’s easy for us to focus on attacking the things that are moving, but we neglect these things that are languishing. Important for overall code health.
- Release Engineering needs, planning
- see also: https://docs.google.com/presentation/d/1usEVcXJirUXCYr7DIBnS7QFZwxzniOEXlm_ek-yy85M/edit#slide=id.g4dff9d87c8_0_643
- WG K8s Infra pre-requisites:
- Current focus is container images:
- All the cloud providers and C[RNS?]I providers have their own disjoint repositories.
- A tool has been built to allow a limited set of people to “promote” their image into a gcr.io repository.
- An owning subproject would do their dev, do their build, do their test, publish their image to their repository and ping the core project. The core (as represented today in this “alpha” phase by wg-k8s-infra) can then review and accept the image into the k8s release image repos. There will be a directory structure hierarchy and OWNERs files to insure this becomes an ungated process.
- See https://github.com/kubernetes/kubernetes/issues/76738
- A second work area: package repos (RPMs / Debs).
- Needs a clear owner or two. Hannes Horl and Tim Pepper are interested.
- Can follow the same process as image promotion. The next needed step is getting packages built nightly.
- Do we need a consolidated KEP? Currently they’re split on package build and package publish. These are distinct, independent steps. But the vision of the pipeline in which they sit also needs to be a shared artifact. https://docs.google.com/document/d/1d4eYH5_RlJ5kuJ0fWjVBFtanBezOavSOjQ6h2Yr8IGU/edit?usp=sharing
- Current focus is container images:
Date: | April 9, 2019 |
Recording | https://youtu.be/jWNLF6Iotyo |
Attending: |
|
[tpepper] rpms/debs support policy- See: https://groups.google.com/d/topic/kubernetes-sig-release/zkYkzzMGjic/discussion
- https://github.com/kubernetes/release/pull/693
- https://github.com/kubernetes/release/pull/688
- Should get a document together for consideration of competing stances and then create a KEP
- [tpepper] Patch Release Management:
- GitHub: @kubernetes/patch-release-team
- Email: kubernetes-patch-release-team@googlegroups.com
- Schedule:
- https://git.k8s.io/sig-release/releases/patch-releases.md
- Should we establish a more regular cadence? Eg: specific day / week of each month?
- [Abubakr Sadik] Like Microsoft Patch Tuesday, could the set of k8s related projects put out the patch releases at the same time make it easier for consumers to know when to look at security issues and applying updates.
- [jberkus] on other projects having a regular (2 months?) cadence has been successful. Should cadence be less often than 2-3 weeks?
- [Stephen Augustus] monthly feels reasonable, beyond
- Action: Tim Pepper to send email to SIG Release, Patch Release Team, and Security Team proposing w/ lazy consensus committing to a monthly release cadence. If agreed, post specific proposal to k/dev for lazy consensus.
- [jimangel] Shadow Selection Survey timeline
- Context: https://kubernetes.slack.com/archives/C2C40FMNF/p1554819581205900
- Trying to outline the survey timeline / wrap up loose ends from 1.14 cycle
- [josh] Need to update release role handbooks with volunteering information
Date: | March 26, 2019 |
Recording | https://youtu.be/u9gK3zWDO9M |
Attending: |
|
[mariantalla] sig-release testgrid dashboards- Will go through a thing that I would like to start, hastily described in http://bit.ly/sig-release-ci
- Desire is to bring (most?) of master’s dashboard config over to 1.14, and possibly prior
- The test boards don’t quite all match, some prior branches’ boards don’t have equivalent tests in master
- Aaron:
- suggests simplify master first, then work back. Consider whether to move upgrade to master blocking (incredibly flakey, take a long time, not run often), otherwise move to informing?
- Can we enforce with automation / linting / tooling that the boards are the same. Ultimately might have boards generated so they’re consistent.
- New jobs expected in a board so they’re observed.
- Can testgrid config be split into multiple files to ungate OWNERS out to the corresponding SIGs
- Now is the time...1.14 is done
- Claire: +1
- Jberkus: +1
- Jorge: +1
- Upgrade:
- This is a worry area for coverage.
- Today’s coverage is via cluster/* bash kube up scripting, poorly maintained by the community (one person now Justin Santa Barbara), fragile. It isn’t actually intended to be Google specific, it’s fully open source, all the runtime debug is available. Does anybody actually use this to do production upgrades in the real world?
- Kubeadm upgrade is coming-ish. There is a doc on how to use it for upgrade, but it’s also not well maintained. [Lubomir] SIG Cluster Lifecycle is planning their 1.15 cycle, which may include kubeadm upgrade.
- cluster/* scripting does fancy testing to insure an app keeps running across an upgrade, but still questions of what is the preferred ordering for upgrading nodes vs control plane and component ordering in nodes. If we just use kubeadm to quickly test that KinD cluster was upgraded to KinD cluster, this doesn’t show that simplistic cluster was actually functional.
- What is the open source canonical preferred cluster configuration and upgrade? Cluster API is also too early to depend on and is unlikely to have upgrade tests in 1.15 cycle.
- Can we get more people on supporting the bash scripting or agree that it’s not used/supported and will just be a release informing test case.
- When (today?) is a kubeadm open source upgrade path ready to be called supported, preferred and release blocking? It has an upgrade option today.
- Proposal agreed: move upgrade tests to informing, continue discussions in community on what has potential to be a release blocking upgrade suite.
- [Josh] Shadow applications for 1.15 RT.
- Need to get started on shadow selection process
- Our apprenticeship shadow mentoring is not completely working. For 1.15 we’re going to do things slightly different. Josh is going to follow the set of shadows more closely and work to insure we are sufficiently building them up. In future cycles we may formalize this “head mentor” role somewhere (under the release lead shadows?)
- Should we continue with the form/questionnaire for prospective shadows? Folks seem to agree, possibly an issue open on improving it…https://github.com/kubernetes/sig-release/issues/454
- Form should be posted tomorrow or Thursday as new folks are commenting on https://github.com/kubernetes/sig-release/pull/548 wanting to participate
- [Claire] Anything else for 1.15 release team?
- Team leads are set: https://github.com/kubernetes/sig-release/pull/548/files
- Prelim schedule:
- Begin April 8
- Similar to prior spans for enhancements and code freezes
- Release target June 24
- Awaiting retro feedback
- KEP merged in implementable state will continue to be a requirement, and not a net new one so it should be clear to people (no slight grace period)
- 1.14 retrospective reminder: https://bit.ly/k8s114-retro Thursday at the community meeting
- There was a loooot proposed after 1.13 and some notable things went quite well in 1.14
- On time, not crazy last minute drama
- Quite clean CI signal across most of the cycle and especially at the end
- KEPs!
- There was a loooot proposed after 1.13 and some notable things went quite well in 1.14
- Rel Eng:
- We still have 3+ build options:
- k/release via Google (current in use)
- k/release via community (no rpm signing, no publishing to official repos)
- k/k/build (needs love)
- New / from scratch artifact generator https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/20190227-artifact-generation.md
- Also #WG-K8s-Infra working in this area
- CNI update package breakage
- Showed up today in https://k8s-testgrid.appspot.com/sig-release-misc#periodic-packages-install-deb (24hrs late)
- Folks also reported in slack and on GitHub
- Options:
- Issue new packages (iterate on package building scripting): 1.11.6-0 -> 1.11.6-1. For all the “supported” older name-version-release’s (1.11., 1.12., 1.13*, kubectl, kubeadm, kubelet, cri-tools, kubernetes-cni).
- General consensus to not do this.
- From Jeffrey Sica to Everyone: (02:46 PM): I feel like we try to cover too many edge cases. The vast majority of people are going to want a cleaner upgrade path to 1.11.7, not cherry-pick upgrade dependencies for 1.11.6
- Tell users how to work around on the command line with flags to yum/dnf/apt/rpm/dpkg and manual downloading.
- Current packages do work.
- This info has flowed in the GitHub issues and on slack.
- Remove older versions from repos, these are not supported and could have security issues. Publish just the head instance of component build.
- What if somebody wants to use them, for reasons...do they need to build their own, maintain their own mirrors...probably are already?
- Act like a distro, maintain the Kubernetes project preferred integration set.
- Can we give meaningful support for the install/runtime variations of all the dependency packaging, testing, integrations?
- Jordan Liggitt has posted a relevant discussion:
- https://groups.google.com/forum/#!topic/kubernetes-sig-release/LtRKMcwfBBE
- https://docs.google.com/document/d/1WA8N7C48nkJmme9a96DU0o9jBpeycPhht8WF-Eam9QQ/edit#
- This doc will help us more clearly define external dependencies and their management
- Do we really need the packages? Who uses them? There are actually people using them. Why aren’t they using their OS’s packages? WG LTS survey hopefully will give us some info on this.
- TBD: Does removing older packages need broader buy in from the community?
- Update docs: Packages today are best effort, and that best effort is at the head of the release branches.
- Issue new packages (iterate on package building scripting): 1.11.6-0 -> 1.11.6-1. For all the “supported” older name-version-release’s (1.11., 1.12., 1.13*, kubectl, kubeadm, kubelet, cri-tools, kubernetes-cni).
- We still have 3+ build options:
Date: | Feb 26, 2019 |
Recording | https://youtu.be/Rx3n1Pg2HLw |
Attending: |
|
[marpaia] Release Team update- Enhancements
- 34 slated, 1 w/o KEP
- Out-of-tree handling: marketing, relnotes, comms
- 15 at-risk (no test plan)
- CI Signal
- Follow on repeated test failures for specific SIGs
- Making sure things are appropriately labeled
- Bug Triage
- Not enough issues in milestone
- Gaining better signal prior to freeze
- Test Infra
- Looking good
- Docs
- Looking good
- Release Notes
- Holding for Jeefy to discuss
- Communications
- Out-of-tree discussions: how best to handle
- Blog posts?
- Branch Mgmt
- Daily fast-forward
- Enhancements
- [] Go 1.12
- https://github.com/kubernetes/kubernetes/issues/73286
- Now that go 1.12 is out, kubernetes 1.10 and 1.11 are built using unsupported versions of go
- If someone is willing to attempt to bump master to 1.12, aaron is supportive of it (if it lands before Friday), has no opinion on the earlier branches
- Minimal bandwidth to take this on
- SIG Scalability jobs as the signal
- Hesitant to introduce major change this close to Code Freeze
- [justaugustus] Seems we don’t have a policy for this and we should
- [calebmiles] We should figure out how to implement this. Maybe working with Hannes on a pipeline to make this happen.
- Bandwidth concerns
- Let’s punt to 1.15
- go1.12 brings underlying go versions for k8
- https://github.com/kubernetes/kubernetes/issues/73286
- [tpepper] Patch Release Team comms
- Issue: https://github.com/kubernetes/sig-release/issues/516
- [justaugustus] Already created a Google Group for this
- Need to update the list and issue
- [caleb] Could we maybe choose something like discuss, that might be more inclusive
- [jeefy] Release notes
- The purpose of release notes and the reality of Themes
- Better release notes formatting
- Comments on out-of-tree release notes: https://github.com/kubernetes/sig-release/issues/486
- Comments on release notes in general: https://github.com/kubernetes/community/issues/484
- Move Major Themes to Kubernetes Blog?
- [spiffxp] Relnotes should have the authority to help define the theme
- [caleb] As a project, we should be able to understand/define what Kubernetes is (SIG Arch) and allow that to help guide themes
- [augustus] 1.15 Planning
- [augustus] 1.15 Enhancements + Comms
- Submitting Enhancements
- Staffing
Date: | Feb 12, 2019 |
Recording | https://youtu.be/Nb7SxI6dJrU |
Attending: |
|
[tpepper] Jan 31 gave SIG Release update at the community meeting (Slides); next due April 18- New subproject definition work:
- [tpepper] Release-engineering:
- https://github.com/kubernetes/sig-release/tree/master/release-engineering
- Vision: https://docs.google.com/presentation/d/1usEVcXJirUXCYr7DIBnS7QFZwxzniOEXlm_ek-yy85M/edit?usp=sharing
- [Linus Arver] question of https://github.com/kubernetes/sig-release/issues/471#issuecomment-457886296 ...what is the difference between k/release and k/sig-release/release-engineering?
- [justaugustus] Licensing:
- [tpepper] Release-engineering:
- [justaugustus] KEPs:
- these have been mostly optional, required in 1.14
- Retrospective for 1.14’s enhancements freeze and KEPs handling to be scheduled soon via SIG PM (ping Stephen Augustus to get invited)
- KEP template feedback. Please provide thoughts:
- [jberkus] Cleaning up sig-release CI boards.
- Testgrid’s SIG Release board split into “blocking” and “informing”
- Also some tests moved over to other SIG’s boards
- Owners of tests: There is an open issue to define an owner per test. Ones that don’t have an owner will be removed from release blocking.
- Should we carry over the updated layout of tests to the other versioned boards for the prior (1.13, 1.12, 1.11) release branches or leave them as-is, and starting with 1.14 the layout would be different?
- Decision made to leave the old branches as-is. 1.14’s branch will be the first to receive the changed layout from the master branch.
- [tpepper for Nishi Davidson] Out-of-tree features: https://github.com/kubernetes/sig-release/issues/486
- Tracking requirements are not well documented
- Base assumption that out of tree features would be held to a similar or better standard relative to k/k’s practices and processes
- How to coalesce information on status and changes from the out-of-tree work?
- What is the output of the release team? A designation that k8s works abstractly? Works in conjunction with specific tested dependencies as integrated in testing? ...Is Kubernetes a kernel or distribution?
- Should SIGs own their things and done?
- SIG AWS had feature they wanted to go straight to GA, skipping alpha/beta
- SIG Windows felt they were ready to declare an initial minimal viable state as GA for windows containers, rest of the community didn’t agree. Felt to SIG as if the bar for acceptance was constantly getting higher across 9 mo’s
- We need to establish criteria, document it
- Vendors / distributions would need to validate their specific distro reference integration
- Core validation today is against GCE and AWS (in a funkily coupled way), need to improve that to be both broader (more coverage), and narrow validation (mocked?) of the interface
- Changelog historically has been about more than just the core Kubernetes, implying reference integration. Sections give info to downstream users and vendors to help understand changes of the point interfaces and components.
- If cluster/* is removed and replaced with declarative component list that feeds test automation for release...the situation is more clear.
- Do KEPs capture out of tree information? There should be (per Stephen / SIG PM) a KEP for things even outside k/k. The release team likely would not track those KEPs.
- We need a communication mechanism to drive change information out to the ecosystem.
- [dims] How do we handle features released/integrated later? Especially something that is a community project (not a vendor/distribution).
- [Caleb] Subprojects sponsored by SIGs would clearly be “part of Kubernetes” and included in the vendor flow. If there are no SIG-${Vendor provider}, rather they’re subprojects of SIG Cloud Provider, the Cloud Provider abstraction is the primary statement in the comm’s flow.
- [dims] what about things like KinD or Minikube that follow? Trust but verify? As opposed to gating release on their validation data.
- [Caleb] deferred commitment for inclusion of any comm’s from across all of k-org. If things are ready by some cut-off date (~code freeze?) their comm’s are included in that of the k/k release, otherwise they missed the train.
- [Stephen] does this mean release team has to track everything in all of k-org?
- [Claire] if being tracked, what are the criteria? Implementable KEP and issue in k/features repo? Ie: release team tracks k-org (highlight?) features. Consensus: Yes.
- [Caleb] release team controls a valuable resource. Projects have no right to it. It is a privilege. If a SIG wants their work on that 1.XX release train, they need to engage in the 1.XX release process.
- [dims] How to convey this requirement and get buy in broadly?
- [Jim Angel] is there similarity to SIG Docs criteria for deciding what is in the official docs and website?
- [Dhawal / Lubomir] how to create common declarative yaml indicating interdependencies (and replace cluster/*).
- Lubomir’s thinking of a KEP maybe later in 1.15?
- Caleb has some ideas on possible shape of implementation.
- “All the information is out there”, except there are multiple places and sometimes they declare information differently (one part integrates with verX and another with verY).
- [dims] see also:
- [Dhawal] when would a KEP k-org be required in implementable state to get sufficient core review ahead of a core release cycle?
- [tpepper] there is general awareness starting in 1.14 cycle of the need for KEPs, probably even beyond just k/k’s contributors. As we move more toward having split the monolith (today we are starting to split) .. ie: the split is mostly done, contributors probably understand the need to get on the train. Wouldn’t need to have their KEP ready months in advance.
Date: | Jan 29, 2019 |
Recording | https://youtu.be/ZiZiuaN5pGE |
Attending: |
|
Subprojects: need definition work- https://github.com/kubernetes/sig-release/pull/453/files#diff-322d9f3e61a4edcf0b1f9a8c0802b107 is just place holders
- Releng:
- Packaging (deb/rpm) discussion
- SIG Cluster Lifecycle is driving actions on unifying the build side
- SIG Release needs to drive on the official builds’ publication, hosting (in conjunction with #wg-k8s-infra), and signing.
- Interns: do we have specific items for which we could use an intern and can commit mentoring resource(s)?
- Google Summer of Code (GSoC): these are supposed to be defined in specificity, but if we write it up in a GitHub issue, we need it held for the applicant to actually do versus somebody just doing it in the meantime. (Deadline 6th Feb.)
- Josh Berkus is submitting for a spring or summer Outreachy intern, for more intensive projects either for SIG-Release or Test-Infra. Outreachy can be a 2-3x large scope project versus GSoC
- Visualization related to test results might be something for which a larger pool of applicants have skills already.
- Ben Elder’s interested in mentoring, having benefited from past GSoC participation
- Finding committed mentors is a key starting point.
- Caleb: automated go/no-go reports daily, visualization of current project heath, deltas day to day in the health metric. Can help with scoping an idea on this.
- Airtable work could fit with GSoC.
- Machine readable dependencies declared across org, consumable by a tool for change management (release notes)
- Update the bash scripting in k/release to something more modern and tied to CI/CD automation. This (and k/k/hack) are hard to maintain, and hard for newcomers to approach, impacts long term sustainability in project. Doing this well also hinges on a solid definition of “what is Kubernetes” and “what is releasing Kubernetes”
- KubeCon: intro and deep dive sessions…
- EU / Barcelona:
- Intro: Tim willing to do our “standard” intro, who’d like to help update the deck and co-present?
- Deep Dive: topic? volunteer(s)?
- Rel eng subproject status update
- Patch release management / team update
- BoF w/ #wg-k8s-infra
- Container image promoter tool and associated build/release workflow changes?
- Packaging improvements
- China / Shanghai
- Intro: volunteer(s) for the standard intro?
- Josh Berkus could do, but a new person would be better. An Asian member of SIG-release would be ideal. Josh to make sure it gets submitted.
- Deep Dive: topic? volunteer(s)?
- Intro: volunteer(s) for the standard intro?
- EU / Barcelona:
Date: | Jan 15, 2019 |
Recording | https://www.youtube.com/watch?v=Al1ClqqXyHk |
Attending: |
|
Proposed v1.10.13 release [@liggitt]- Proposal: single commit to fix destabilizing regression introduced in v1.10.12
- CI jobs for release-1.10 are still intact and running (https://testgrid.k8s.io/sig-release-1.10-blocking) with this regression being the red line of FAILING. N-3’s CI jobs are removed around week 5 of the current release’s cycle.
- Discussion in slack and in the PR on precedent of cutting an n-4 release
- Need to be cognizant of this perhaps introducing a slippery slope, and of the past month’s worth of PRs having been declined because 1.10 was “done”. Documentation is sufficiently clear today in most people’s opinions, for this PR in its GitHub history note the justification.
- AI: @liggitt to summarize discussion in PR
- AI: document interaction between spinning down CI for an n-4 release branch and inability to cut new release
- AI: document/highlight (where) the caution required on the last patch release of a release branch (when no future patch releases to fix a regression can be expected), definitely more is needed in patch release manager handbook, but also in other more visible places
- Discussed criteria for acceptance of this PR:
- Supporting CI jobs are still in place
- The issue being fixed was a regression in a patch release
- The issue being fixed is of sufficient severity
- The fix is well understood and contained (doesn’t introduce risk of additional regressions)
- ...Jim Angel opening an issue in SIG-Release to track formalizing/documenting these criteria
- Decision: move forward with PR
- 1.14 release team shadows questionnaire is out for folks to submit their desire to volunteer and get reviewed asap (Friday?) *
- Propose staffing subprojects, create area/foo labels for them [spiffxp]
- hyperkube
- Currently k/cluster/images/hyperkube/OWNERS has approvers ixdy, luxas, mikedanese
- release-team
- Currently k/release/OWNERS + k/sig-release/release-team/OWNERS
- Propose splitting into 2 subprojects: release-engineering = k/release/OWNERS (ref: Oct 9 meeting), release-team = k/sig-release/release-team
- Currently k/sig-release/release-team approvers = SIG Release Chairs + 4 Release Team Leads
- Propose Current Release Team Lead + people, volunteers?
- Currently k/release/OWNERS approvers = ixdy, listx, mikedanese + SIG Release Chairs (out of date) + Patch Release Managers (back to 1.8) + Release Leads (back to 1.8)
- Propose Current Release Team Lead + people, volunteers?
- publishing-bot
- Currently k/publishing-bot/OWNERS is caesarxuchao, sttts
- Propose adding nikhita to approvers
- Sig-release (not actually a subproject)
- As-is
- Licensing
- Suggestions for licensing? justaugustus is usually point person in fielding responses, is this a subproject or just a long standing sig-release issue?
- Leads: justaugustus, nikhita, Steve Winslow (LF)
- Release Engineering
- Have OWNERS of this subproject distinct from “release-team” subproject, eg: start with Branch Managers + PRMT (Patch Release Mgmt Team)
- Process and procedures around release tooling and artifacts
- Overlaps with #k8s-infra-team and #sig-testing perhaps, and especially #sig-cluster-lifecycle
- The members of this subproject would be OWNERS of https://git.k8s.io/release repo
- Others:
- PST (Product Security Team)? Who really owns this? Their docs live in k/sig-release, but they are a bit of an outlier on where they fit in the governance model. Not our SIG Release issue to resolve right now, versus steering committee. TBD...table for now.
- Security? There is wg-security-audit, distinct from incident response (Product Security Team). But there’s artifact signing, release thread model, provenance, etc. Could fall under PST, or under release engineering subproject. TBD..table for now.
- “Project support stance / release cadence” aka WG LTS Were it not to get WG approval is it a SIG Release subproject? TBD..table for now.
- hyperkube
- FYI Jan 31 SIG Release update at community meeting...need to start drafting
- KEP for Image Promotion Process and efforts behind the Container Image Promoter
(to be open sourced)open sourced here -- Linus Arver (listx)- Q: Do test images follow this process too? (@patricklang - can’t stay, will check notes or Slack if not covered today)
- A: Yes if those images land in k8s.gcr.io (aka gcr.io/google-containers).
- Q from #k8s-infra-team: Is the goal for everybody to use a blessed GCR image? Or a model where the blessed repo is configurable?
- A: The proof-of-concept tool (Container Image Promoter) is still in an early stage; for now the goal is to just demonstrate how the image promotion process for some arbitrarily small set of images. It may be the case that there will be multiple instances of CIP running at the same time to handle different repositories/environments. But AFAIK the current model is that we have blessed GCR images in k8s.gcr.io which everyone uses.
- Q: Which images are in scope versus out of scope for this?
- A: Initially the core images. The goal is to demonstrate how this would work for a small set of images and then expand later as necessary.
- Q: Do test images follow this process too? (@patricklang - can’t stay, will check notes or Slack if not covered today)
- Cleaning up blocking test boards jberkus
- spiffxp notes sig-release-misc board could be good to keep around for example for any tests that are useful for the various SIG Release subprojects (eg: rel eng subproject hopefully makes a lot of tests for the release mechanisms)
- Owners: some things span SIGs or are handled mostly be folks who are perhaps mostly in SIG Testing. By declaring a SIG Release owner for the job, that owner would responsible for chasing resolution from whoever would be the right person. Alternatively put them under SIG Testing? We primarily need a responsible party who will see the notification of test breaking and start driving to resolution. We’re still hoping that group membership remains curated and group based email notifications lead to folks filtering and highlighting to themselves these in-bound notifications during their tenure as owning. It is also possible to then start measuring notification to response lags.
- Part #3 of the 1.13 release retrospective...discuss the final undiscussed couple bullets and the next release actions table