community/sig-node/archive/ci-subgroup-notes-2022.md

28 KiB
Raw Blame History

Kubernetes SIG-Node CI subgroup notes

2022/12/14

Recording: https://www.youtube.com/watch?v=drlQWZiMj6o

PRs:

2022/12/07

  • swsehgal
    • Planning to bring this to the main SIG Node meeting next week but was wondering if this group has any suggestions on how this can be handled?
    • Do we have Compute optimized nodes in the CI infrastructure? C2-standard-60 (referenced here) provides VMs with multi NUMA but dont think we have them in our infra.
    • Any pointers?
      • Lets talk to k8s infra before taking expensive machine like C2-standard-60

          Sergey

two cheapest options with Numa on GCP:
n2-standard-32 $908.47
skanzhelev@n2-standard-32:~$ grep NUMA=y /boot/config-`uname -r`
lscpu | grep -i numa
CONFIG_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_ACPI_NUMA=y
NUMA node(s): 2
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31
n2d-standard-32 $790.49
skanzhelev@n2d-standard-32:~$ grep NUMA=y /boot/config-`uname -r`
lscpu | grep -i numa
CONFIG_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_ACPI_NUMA=y
NUMA node(s): 2
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31

  • SergeyKanzhelev

2022/11/23 [Canceled - short week in US]

2022/11/16

2022/11/09

2022/11/02 [cancelled due host unavailablity]

2022/10/26

2022/10/19

Agenda:

  • Sergey, Swati
  • Brian
  • Mike

Bugs triage: 6 bugs

2022/10/12

Agenda:

  • Brian

2022/10/05

Agenda:

2022/09/21

Agenda:

I will resurrect this: https://github.com/kubernetes/test-infra/issues/24641

2022/09/13

Agenda:

  • Sergey

2022/09/07

Attendees:

Agenda:

2022/08/10

Attendees:

Agenda:

2022/08/3

Attendees:

Agenda:

2022/07/27 [Cancelled due to codefreeze]

Attendees:

Agenda:

  • danielle

2022/06/29

Attendees:

Agenda:

  • paco
  • paco

2022/06/22 [Cancelled, Zoom 2FA issues]

Attendees:

Agenda:

2022/06/15 [starts 15 minutes late]

Attendees:

Agenda:

2022/06/08

Attendees:

Agenda:

  • Triage mostly

2022/06/01

Attendees:

Agenda:

fromani

2022/05/25

Attendees:

Agenda:

  • (Vaibhav) Why are EvictionHard's imagefs.available and ImageGCHighThresholdPercent the same by default

2022/05/04

Attendees:
![][image1]

Agenda:

04/27/2022

Attendees:

![][image2]

Agenda:

Thread to start discussion on planning reliability/maintainability improvements

Danielle will send something to the mailing list tomorrow.

Francesco:

04/20/2022

Attendees:

Agenda:

  • ehashman
    • Elana taking a break during 1.25, stepping down as CI subproject lead
    • Nominating Danielle to step up as new lead
  • Sergey
    • Arnaud: focus on jobs failing for more than a year
    • Everything under 90 days is not relevant
  • Sergey
  • Sergey
  • https://github.com/kubernetes/test-infra/pull/26000

04/13/2022 [Canceled due to lack of quorum and being in test freeze]

Please help with the release blocking: https://github.com/kubernetes/kubernetes/issues/109082 if you have cycles!

04/06/2022

Attendees:
![][image3]

[arnau
[https://github.com/cri-o/cri-o/pull/5777](https://github.com/cri-o/cri-o/pull/5777)

03/30/2022

Attendees:
![][image4]

03/23/2022

Attendees:
![][image5]

- [arnaud] The Great Migration to registry.k8s.io
- FYI : change containerd config:
https://github.com/kubernetes/test-infra/pull/25739
https://github.com/kubernetes/test-infra/pull/25742
Ill not be around for the meeting. If you see any failures related to this change, please revert!
- [mmiranda96] Need review on https://github.com/kubernetes/kubernetes/pull/108862
- Fedoras job is passing now (https://github.com/kubernetes/kubernetes/issues/104292#issuecomment-1074417968)

03/16/2022

Attendees:
![][image6]

http://perf-dash.k8s.io/#/?jobname=gce-100Nodes-master&metriccategoryname=E2E&metricname=LoadResources&PodName=e2e-big-minion-group%2Fkubelet&Resource=CPU

![][image7]
http://perf-dash.k8s.io/#/?jobname=gce-100Nodes-master&metriccategoryname=E2E&metricname=LoadResources&PodName=e2e-big-minion-group%2Fkubelet&Resource=memory

![][image8]

03/09/2022 [Cancelled]

03/02/2022

Do we have an issue tracking this failure?
https://testgrid.k8s.io/sig-node-containerd#node-kubelet-containerd-performance-test
still fails even https://github.com/kubernetes/test-infra/pull/25385 is merged
https://github.com/kubernetes/test-infra/issues/25430

Infra flakes a lot, do we have a bug?
Kubernetes Presubmits blocking
https://testgrid.k8s.io/presubmits-kubernetes-blocking

Kubelet memory increase:

![][image9]

![][image10]
2022-02-24 UTC start of the spike

02/23/2022

  • matthyx
    • continue cleaning them, separate PR for gce

02/16/2022

02/09/2022

Status of CI: https://docs.google.com/spreadsheets/d/1IwONkeXSc2SG_EQMYGRSkfiSWNk8yWLpVhPm-LOTbGM/edit#gid=1187923038

02/02/2022

  • Mike
  • Danielle
    • for the disk pressure, might create a small tmpfs that's easier to fill up
  • Perf dashboard: see a small bump on runtime memory, let's check next week if its the same pattern when it will go down

![][image11]

01/26/2022

01/19/2022

01/12/2022

01/05/2022