community/sig-node/archive/meeting-notes-2018.md

37 KiB
Raw Blame History

SIG Node Meeting Notes

Future

  • Monitoring pipeline proposal
  • Virtual Kubelet (@rbitia)
  • PR Review Request - #63170 - @micahhausler
  • Image name inconsistency in node status - #64413 - @resouer
  • [Jess Frazelle / Kent Rancourt]: Proposal for kubelet feature to freeze pods placed in a hypothetical "frozen" state by the replication controller in response to a scale-down event. Enable pods to be thawed by socket activation.

Dec 18

  • Graduate SupportPodPidsLimit to beta (Derek Carr)

  • UserNS remapping: updated proposal, implementation PR (vikasc):

    Question raised at Zoom Chat:

  • From Mike Danese to Everyone: (10:42 AM)

    I would like to reconcile the "Motivation" section with how we want people to use users and groups in general even without remapping. We want tenants to run workloads as different users and groups to segment their disk access to improve security, and we want userns to support compatibility with images that expect to be running as a given uid/gid. The current proposal uses a file in /etc/... which makes getting both hard.Any path towards both these goals?yupexactly

  • From Patrick Lang to Everyone: (10:44 AM)

    bf48175c42/contributors/design-proposals/node/node-usernamespace-remapping.md (sandbox-type-runtimes) - this proposes using nodeselectors to select nodes that can support this. Should this be designed to work with RuntimeClass instead to avoid using node selectors?

  • From Mike Danese to Everyone: (10:45 AM)

    SIngle mapping on a node seems problematic IMO.Namespace would be better.Per pod would be best.I'd like to see that exploredthanks

  • From 刘澜涛 to Everyone: (10:46 AM)

    Just some background, it is possible to support per pod user namespace mapping in containerd. It is already supported, we just need to add corresponding CRI implementation and kubelet support.

Dec 11

Cancelled due to KubeCon Seattle

Dec 4

Nov 27

  • Adding Termination Reason (OOMKilled/etc) Event or Counter (Brian)

    • Following discussion in sig-instrumentation, discuss either
      • A. generating a logline / event for Pod termination reason or
      • B. adding a termination reason counter in cadvisor/kubelet that can export to Prometheus
    • Want to get better metrics when a container is killed.
    • Want to get a count of all the termination reasons.
    • Pod has last state, which has a termination reason
    • kube-state-metrics has a termination reason, but as soon as the pod restarts it's gone
    • would be nice to have a counter for how many containers have been killed for each possible reason (OOM, etc.)
    • at minimum would be nice to have an event with the termination reason
      • on top of this would be nice if cadvisor or kubelet could export a counter (e.g. for these events)
    • Relevant Issue is: "Log something about OOMKilled containers" https://github.com/kubernetes/kubernetes/issues/69676
  • Node-Problem-Detector updates (wangzhen127)

    • adding a few new plugins:
      • kubelet, docker, containerd crashlooping issues
      • more checks on filesystem and docker overlay2 issues
    • plans on making NPD easier to configure within k8s clusters
  • **wg-lts a new working group. [Dhawal Yogesh Bhanushali dbhanushali@vmware.com]
    **Mailing list: https://groups.google.com/forum/#!forum/kubernetes-wg-lts
    PR: https://github.com/kubernetes/community/pull/2911
    Survey: https://docs.google.com/document/d/13lOGSSY7rz3yHMjrPq6f1kkRnzONxQeqUEbLOOOG-f4/edit

    slack: #wg-lts

Nov 20

Meeting canceled due to USA holiday (Thanksgiving)

Nov 13

no agenda

Nov 6

Oct 30

Oct 9

Oct 02

\

Sept 18

Sept 11

  • Kubelet Devices Endpoint (device monitoring v2)
  • Fix port-forward fo non-namespaced Pods (@Xu)
    • https://docs.google.com/presentation/d/1x1B2_DFZ9VI2E_-pB2pzeYKtUxrK4XLXd7I0JWD1Z40/edit#slide=id.g4189217af3_0_58
    • Related to: containerd shimv2 + kata etc
    • some comments in the meeting:
      • Kubelet doesn't want to see any containers/images visible to but not managed by kubelet. So if we want a solution like method 3, it should not visible to kubelet at all. And method 1 looks good to kubelet.
      • There is a debug-container under discussion, which works quite similar to the method 3.
      • Sig-node likes to revisit the port-forward method itself from the architecture level(?), however, the feature is required by many use cases, such as OpenShift, and it is essential.
  • RFC Improve node health tracking - https://github.com/kubernetes/community/pull/2640
    • No discussion necessary at the moment

Sept 4

Aug 28

Aug 21 proposed agenda

Aug 14

  • Summary on the New Resource API Follow-up offline discussions (@vikaschoudhary16, @jiayingz, @connor, @renaud):
    • We will start with multiple-matching model without priority field.
    • We will start with allowing ResourceClass mutation.
  • Summary on the New Resource API user feedbacks (@vikaschoudhary16, @jiayingz):
    • tldr: we received feedbacks from multiple HW vendors, admins who manage large enterprise clusters, and k8s providers that represent large customer sets that the new Resource API a useful feature that will help unblock some of their important use cases

Aug 07

July 31

July 24

  • Agenda Topic (owner)
  • [tstclair & jsb] CRI versions and validation
    • Docker 18.03-ce is the only installable version for Ubuntu 18.04

July 17

  • Device "Scheduling" proposal (@dashpole) slides
  • New Resource API KEP proposal (@vikaschoudhary16 and @jiayingz)
    • Discussed the user stories we would like to enable through this proposal
    • Discussed goals and non-goals. In particular, discussed why we proposed to use ResourceClass to express resource property matching constraints instead of directly expressing such constraints in Pod/Container spec. These reasons are documented in the non-goal section
    • Question on whether it is common for people to create a cluster with multiple types of GPU. Or do people usually create multiple clusters, each maps to one GPU type.
      • Answer: Yes. We have seen such customers from Openshift, GKE, and on-prem.
    • Bobby: When we start to support group resources, and ComputeResource to ResourceClass matching behavior becomes many-to-many, that would bring quite a lot complexity and scaling concerns on scheduler. People have already seen scaling problems on scheduler when they use very big number of extended resources.
      • Answer: scaling is definitely a concern. We don't plan to support group resource in the initial phase. We need to collect scaling and performance numbers and carefully evaluate them after the initial phase before moving forward. We should publish performance numbers and best-practice guidelines to advise people from mis-using the building blocks we put in. We should also think about the necessary mechanism to prevent people from getting into "bad" situation.

July 10

July 3

Cancelled

June 26

June 19

June 12

June 5

Cancelled

May 29

May 22

  • Sandboxes API Proposal (@tallclair)
    • Q: Do we expect cluster admin to label nodes which support sandbox?
      • It's not clear whether we should require all container runtimes to support sandbox, or this is an optional feature.
      • This doesn't block alpha.
      • Ideally each node will only support one sandbox technology, but if the user want to use different sandbox technology, they may need to have different node setup and label the node.
    • Q: Will this be included in conformance test?
    • Q: Can the sandbox be a runc implementation which meets the conformance?
      • It is hard to define a conformance test suite to validate whether the sandbox meets the requirement.
      • We can only provide guideline for now.
      • We are still working on the definition, but one probably definition: idependent kernel, and 2 layers of security?
    • If the sandbox definition is not clear, derek is worried that:
      • User may never or always set Sandbox=true.
      • If sandbox is only a boolean, when sandbox=true, a workload may work for some sandbox technology, but not work for another sandbox technology.
    • Q: Why not just expose OCI compatible runtime? So that we can get similar thing with sandbox boolean, and more flexibility.
      • For example, we can have an admission controller to do the validation and set defaults for an OCI compatible runtime, which is not necessarily to be in core Kubernetes.
    • Q: What is the criteria to graduate this to beta/GA?
      • The alpha version is mainly to get user feedback, it is not finalized api.
    • Q: Is this a Kubernetes 1.13 evolution? Or a longer term project?
      • A: 1.13 would be good if there is strong signal that this works, but expect this to take longer. Would like to see this gets into alpha in 1.12.
    • Q: Is Selinux blocked when sandbox=true?
      • AppArmor, Selinux are not blocked, guestOS can support to provide better isolation between containers inside the sandbox.
    • Q: Instead of changing all "Sandbox" in CRI, can we rename to a more specific name "Kernel-Isolated" or something, which also makes the api less confusing? "Sandbox" is too vague.

May 15

May 8

  • Cancelled due to KubeCon and OpenShift

May 1

April 24

April 17

  • Re-categorize/tag Node E2E tests (proposal, @yujuhong)
    • We need to consider whether to include test like eviction test into conformance suite. They are essential features, but the test itself has some special requirement 1) it needs to be run in serial; 2) it does make some assumptions such as the node size.
  • Virtual Kubelet (@rbitia) Updates
  • Add probe based mechanism for kubelet plugin discovery: https://github.com/kubernetes/kubernetes/pull/59963

April 10

  • Windows Service Accounts in Kubernetes (@patricklang)
    • https://github.com/kubernetes/kubernetes/issues/62038
    • How to support Windows Group Managed Service Account (gMSA)
      • WindowsCredentialSpec is windows specific identity
      • Option 1: WindowsCredentialSpec references a secret
      • Option 2: WindowsCredentialSpec contains the JSON passed as-is in the OCI spec
      • No password information inside, only some information about the host.
  • Sandbox resource management - discussion questions
    • Pod resources design proposal: https://docs.google.com/document/d/1EJKT4gyl58-kzt2bnwkv08MIUZ6lkDpXcxkHqCvvAp4/edit
    • If user set the overhead, how should runtime controller deal with it?
      • Reject the pod, may cause problem on upgrade.
      • Overwrite it.
    • Derek: Can the node advertise/register the overhead to the control plane? Can node report per pod overhead in the node status? And let scheduler make decision based on that?
    • Derek: Can we make kata as a node resource? And pod wants to run on kata needs to request that resource?
    • Derek: Pod level limit is easier to set by user. - Dawn: It may not be true in the future based on the trend of Kubernetes today, e.g. Sidecar containers injected by Istio. - Derek: We can resolve the istio issue by dynamically resizing the limit.
    • Dawn: Do we expect mixed sandbox and non-sandbox containers on a node? - Tim: Yes. The reason is that we do have system containers can't run in sandbox, e.g. fluentd, kube-proxy etc.
    • Derek: Do we expect multiple runtimes on a single node? e.g. containerd, cri-o etc. - Dawn: No. Each node should have one container runtime, but it supports running container with different OCI runtimes, e.g. runc, kata etc.
    • Derek: What happens to the best effort pod? - Dawn: This is an issue for frakti today, so they still set fixed limit.
  • Container runtime stream server authentication https://github.com/kubernetes/kubernetes/issues/36666#issuecomment-378440458 *
  • Add probe based mechanism for kubelet plugin discovery: https://github.com/kubernetes/kubernetes/pull/59963
  • Node-level checkpointing:

https://github.com/kubernetes/kubernetes/pull/56040

April 3

March 27

March 20

March 13

Mar 6

Feb 27

Feb 20

Feb 13

Feb 6

Jan 30

  • Revised Windows container resource management (CPU & memory):

  • Virtual Kubelet introduction

    • Owner: Ria Bhatia, ria.bhatia@microsoft.com

    • Context:

      Virtual Kubelet is an interface for plugging in _anything, into a K8 cluster, _to replicate the lifecycle of a kubernetes pod. I'd like to kick off discussion for the concepts of VK, the design of the project itself, networking scenarios and also come up with a formal provider definition.

  • Node e2e conformance tests: https://github.com/kubernetes/kubernetes/issues/59001

    • Spreadsheet to re-categorize tests: please comment if you think a test should be added to or removed from the conformance suite. Please also suggest feature tags for tests.

Jan 23

Jan 16

  • Review Pending v1.Pod API Changes [@verb]
  • Windows container roadmap and Windows configuration in CRI (Patrick.Lang, jiangtli @microsoft.com)
    • https://trello.com/b/rjTqrwjl/windows-k8s-roadmap
    • Windows container configuration in CRI:
      • https://github.com/kubernetes/community/pull/1510
      • Questions:
        • Question: Is there pod level sandbox isolation mechanism on Windows?
        • Answer: Currently only kernel level isolation for per-container, no hierarchy isolation like cgroups. They will work with containerd and cri-containerd to add the hypervisor level isolation at pod level.
        • Dawn: Can we add e2e test for the change we make?
        • Dawn & Yuju: Can we have an overview of how resource management will work on windows?
        • Dawn: Let's start with CPU and memory, from your current proposal, currently there is no storage resource define there now.
  • Review request: Node-level Checkpointing Manager (Vikas, vichoudh@redhat.com)
    • https://github.com/kubernetes/kubernetes/pull/56040
    • Just requires approval
    • It's good to have a common library instead of having separate implementations in different part. However, we should make it clear that we prefer not to add checkpoint.
    • Derek: Maybe we can have a document to track all the checkpoint done by kubelet, and whenever people add new checkpoint they need to update the document.
  • Short discussion about the node-fencing progress (bronhaim@, redhat)

Jan 9

Jan 2

Cancelled.