community/sig-node/archive/meeting-notes-2016.md

70 KiB

sig-node weekly meeting

Dec 13

  • virtlet demo
  • cri-o demo
    • Antonio from redhat
    • pod and container lifecycle management works, and image service implementation is done
    • start integrating with Kubelet
  • CRI rollout and planning
  • Resource management workgroup
    • Started from Jan 3rd, and expected is dismissed once the roadmap and planning is done.
    • preemption & eviction priority and schema
  • Will CRI shim for docker support exclusively CNI for networking?

Dec 6

  • Will CRI support exclusively CNI for networking?
    • CRI right now with dockershim's impl only supports CNI
    • Next step is file a github issue or slack ping probably since we didn't get a clean answer here
  • Additional CNI question: no distinction between networks for workloads
    • same with/without CRI: networking is set up once per-node
    • better question for sig-networking
    • If required, CRI can evolve as well.
  • garden team: demo
  • docker 1.12 liverestore (https://docs.docker.com/engine/admin/live-restore/) should it be enabled by default?
    • Related to disruptive updates; if we wish to be less disruptive
    • CoreOS does not enable it currently, but @euank doesn't know if there's a specific reason
  • Shaya/AppOrbit, Infrantes public cloud CRI demo
    • Enables pods to be isolated within independent VMs
      • differs from hyper in that the VMs can be separate public cloud instances where nested virt isn't supported
    • Enables orchestration of full VM images that aren't running containers but act as Pods to the k8s cluster

Nov 29 Agenda

  • FYI: Shared pid namespace
    • No discussion needed
    • PR for rollout sent for review
    • brendanburns suggests we consider supporting isolated namespaces indefinitely
  • rkt attach demo
    • Implemeting the design proposed in #34376.
    • Addresses a problem of the pre-CRI rkt implementation; support kubectl attach and kubectl run -it
  • Dawn: Some ideas for sig-node next year:
    • Node level debugability
    • Node management / availability (daemon update, repair, etc)
    • Node allocatable rollout (e.g. daemon overhead iiuc?)
    • CRI, validation test, beta/stable
    • Checkpointing (finally)
    • Tackle logging story
    • Auth/security between daemons / auth between applications
    • Resource management (related to many of the above too, yeah)
      • Mostly for reliability, not efficiency
      • pod overhead, etc
      • Better guarantees for performance / etc node
      • Disk management
    • Final: kubelet as a standalone thing/"product"
      • Checkpointing, node level api and versioning
  • virtlet

Nov 22

  • Announcements & sync-up
    • Derek: Putting together a workgroup that reports back to sig-node for resource management. Specifically to allow management of resources outside the kubelet for exploratory work, identify ones that should be managed by kubelet.
    • Look for an announcement sometime next week
  • Status of CRI on 1.5
    • v1alpha "done" for docker, community can and should try it and give feedback
  • Shared pid namespace:(verb@google.com)
    • First step is to make infra container reap zombies
    • https://github.com/kubernetes/kubernetes/pull/36853
    • But will infra container even be around for all run times in the future?
    • Yes please!
    • First step is the pause container as an init process
    • Other runtimes already handle that (e.g. rkt and cri-o)
    • On by default?
      • Some containers assume PID 1 (e.g. for exec kill -SHUP 1, or in their /entrypoint.sh messy pseudo-init).
      • Some containers also bundle init systems
    • Dawn: there was discussion about having our own init for a pod
      • rkt, pause container, cri-o all have init processes. infra is a bit of a hack and docker specifc, but we should be able to get rid of those.
      • For now, his change makes sense, just go with it, and we can consider long-term unification in parallel/later
  • Backwards compatibility concerns:
    • disruptive CRI
    • disruptive cgroup rollout
    • Have we done disruptive rollout before?
      • In GKE do that.
      • Openshift: drain nodes before said updates do that.
      • In the past, maybe docker labeling broke this? No specific memory.
    • Currently planning to make both of those disruptive
      • Euan: Action item, double check this is sane from CoreOS side
  • Hi from Cloud Foundry garden team! (What can we share, how can we help?)
    • Next call, maybe demo and talk about garden a little
  • rkt roadmap:
    • 1.5 continuing to work to get attach and e2e happy (might bleed into 1.6)
    • 1.6 alpha release and recommend general "tire-kicking"
  • CRI status, api interface. Rollout in 1.5, alpha api, what does that mean?
    • Cluster api, we don't suggest prod usage because it might change
    • This is internal, so it's different, right? Compatibility is an internal detail, not external/user.
  • Will CRI support exclusively CNI for networking?
    • Furthermore, is the network config missing from the CRI?
      • Maybe? It's alpha
    • Come back next week
  • 1.6 roadmap planning
    • Community meeting talked about reliability.
      • Resource management
      • disk management
    • Lots of "in-flight" features which are not marked stable yet, have not been seen through.
    • Use 1.6 release to "finish work" and rollout
      • CRI
      • Pod Cgroup
      • Node allocatable
      • …..
    • Part of "nodespec work"
    • Focus on reliability and rollout of features. Finish node level testing.
    • Let us/Dawn know about other potential items for the roadmap.
    • Expected date? presumably before 1.6 :) TBD

Nov 01

  • Image Exec (verb@google.com)
    • Better support for containers built from scratch
    • kubectl exec -it -m image_name pod_name
    • Proposal: https://github.com/kubernetes/kubernetes/pull/35584
    • Usecase primarily dev cluster only or dev+prod?
      • Both
    • Mount namespace vs filesystem view (e.g. proc and dev and so on might differ)
      • No solution offered for this
    • Pod lifecycle for this pod
      • run a pod + exec into a pod
      • Dangling pod problem with kubectl run --rm?
        • No answer known yet
    • Display to user:
      • Is it hidden from regular listing? (e.g. get pod)
      • Right now there's an idea of separate init container statuses. Will there be a separate debug container construct?
      • There's been discussion before of "tainting" pods that have been exec'd into, debugged.
    • Resources? This can add additional resource needs. Do we reserve things? Push it onto the user?
      • Derekwayncarr: Size pods in advance to be able to be debugged
    • Cleanup / reliability:
      • Fairly intrusive..
    • Security?
      • Image whitelist / blacklist?
        • That's being added, but now needs to be added in one more place
        • Admission controllers will need to tie in too
      • one implication: Userns + host bindmounts, depending on how userns is implemented, could be messy (hostmount implies no userns, but userns relabeling might be enabled for the whole pod)
        • No answer to this.
      • Does SELinux interact with this? Do we need to relabel container rootfss in a scary way?
        • No answer for this.
      • Concern about how this interacts with introspection tools RH has that e.g. scan podstatus for image IDs
    • Alternative: Allow modifying the podspec to run a new container?
      • Dawn: Idealy, but that has a problem of needing more upstream changes, can't just be kubelet
  • Kubelet pod API, initial version
    • New kubelet API for bootstrapping
    • https://docs.google.com/document/d/1tFTq37rSRNSacZeXVTIb4KGLGL8-tMw30fQ2Kkdm7JQ/edit?usp=sharing
    • Minimal guarantees in this implementation ("best-effort")
    • Differences from static pods:
      • These are persisted into the api-server whereas static+mirror are "fake"/read-only
      • These give you a "real" r/w copy in api-server
    • Other option is bootkube. Temporary control plane with a "real" api-server, client transitions between em
      • Complexity, painful to maintain the code
    • This implementation adds a new pod source of an API
    • Due to security concerns, it would be a default-off api, potentially listening on a unix socket
    • Will create pods before the api-server is connected to. Will move them to the api-server when able to
      • Api-server pod persistence will result in a pod restart to fixup UIDs effectively
    • Derekwayncarr: We want to get rid of the existing ways; what other alternatives are there?
      • The bootstrap/run-once kubelet was shot down. And this has some other nice properties as well (e.g. disaster recovery)
    • Not a full api-server in the kubelet though. essentially only nodename stored pods
    • Derek: Will this be versione/1B856NU1Ie0Pid4xGV2D9QZUJhUD24QzYsJYqH0yJU8A/edit#gid=0
    • No sig-node weekly meeting on the 8th dd, will it be pretty and hygenic? Do we distribute clients?
      • We don't have answers for those
      • Maybe just experimental for now and not tackle those?
        • Derek: Concerned with experimental features that aren't as well thought out
  • Demo: rkt with runc (@casey)
    • Does runc list
  • Kubecon f2f details

Oct 25 Agenda

NOTE: This meeting is being held in ZOOM due to troubles with hangouts: http://www.zoom.us/my/sigwindows

  • Conflict with DiskPressure and Volume cleanups
    • https://github.com/kubernetes/kubernetes/issues/35406#issuecomment-256101016
    • Suggestion: We should be always cleaning up memory-backed volumes, but at minimum we need to start doing that on eviction
    • Prioritization? Red Hat is willing to help fix this for 1.5 since it's causing real pain for them with secrets + eviction
    • Possible problem: Kubelet falls over, pod is removed, kubelet comes back up, volume manager can't find from the state whether an emptydir is tmpfs trivially.
      • Doesn't matter, volume manager should still clean it up, unmount, what have you? Dig into it a bit more and comment on that issue
  • Image Exec (verb@google.com)
    • Better support for containers built from scratch
    • kubectl exec -it -m image_name pod_name
    • Draft proposal: http://bit.ly/k8s-imageexec
    • verb@ to open a PR to main k8s repo
    • tl;dr; Run another container image in the same namespaces as the pod; Use the other image to debug the pod.
    • Expected to be reviewed this week
      • Probably discussed more next week in sig-node as well
    • Some questions about how to properly deal with viewing its mount namespace
      • Post them on the proposal!
  • Quick rktlet demo (init containers & adding bindmounts)
    • Demo will happen next week :)
  • rktlet status
    • Vendoring in in-progress (couple dependency issues); to be followed by e2e testing
    • Attach & logging
  • Kubelet in a chroot https://github.com/kubernetes/kubernetes/pull/35328
    • CoreOS has been shipping kubelet in a chroot for the last half year
    • Volumes might require path remapping
    • Does not concern most k'let developer
  • Kubelet mounts volumes using a container
    • Using rkt fly to run a mounter image to setup k'let volumes.
    • Stop gap solution until k'let in a chroot lands

Oct 18

Oct 11

  • Reducing privileged components on a node
    • Can other components (kube-proxy) delegate privileged host operations to kubelet? (e.g. firewalld operations)
    • Dawn: Opinion, that makes kubelet more monolithic. It's the main agent, but it should be able to delegate. Preference for moving things to separate plugins where reasonable.
    • Euan: Counterpoint, multiple things talking to api-server has some extra auth problem
    • Is kube-proxy actually core? Out of scope of this sig :)
    • Minhan: Note that kube-proxy is an implementation detail; over time it will potentially differ.
      • This discussion is also about more than just kube-proxy
  • Pod cgroups (demo update - decarr@redhat.com)
    • q on default: Can't it default to 'detect' and try docker info, and if it doesn't have info fallback to cgroupfs?
      • It does do the right thing for docker integration, but document says the wrong default :)
    • Note: Only works for systemd for 229+ because of opencontainers slice management stuff
    • Upgrade of existing node to this feature?
      • Evacuate the node first. We don't support 'in-place', it's not in-scope
    • We don't currently tune memory limits over time either
    • Some docker-things are not charged per-cgroup (like docker's containerd-shim for example)
      • Also not in-scope; upstream changes
    • Euan: Will also look into making sure the systemd driver works well for rkt. It should work with effectively systemd-run --slice=$parent under systemd driver
    • Yuju: We can add the info call to give that info to CRI, just do it :)
  • When can we assume pod-cgroups exist, e.g. for eviction and so on?
  • CRI logging proposal: https://github.com/kubernetes/kubernetes/pull/34376
    • q: This is only under CRI/experimental, right? Yes, it's part of an experimental flag, default should not differ
  • CRI networking: https://github.com/kubernetes/kubernetes/pull/34276
    • Original issue: https://github.com/kubernetes/kubernetes/issues/28667
    • Who should own it? Kubelet or runtime?
    • How about the configs (e.g. plugin dirs etc)
      • Freehan: Eventually move to part of the runtimes, deprecate kubelet flags
    • Will kubenet still exist? Only CNI?
      • Eventually it'll be a cni plugin perhaps
    • sig-networking discussed this some
      • Some considered out-of-band vs passing through all networking stuff
    • In the future, higher level "Network" objects might exist. Already, networking exists as a kubernetes construct to a degree.
      • CRI will have to grow for this to include a good chunk of that.. or out-of-band
      • In the future, the 'UpdateConfig' object might expand and these objects will have to be somewhat runtime-implemented
    • CRI will hve to include tc, etc type stuff so that the runtime can apply network shaping
    • There's also the implicit assumption that networking metrics aren't core when they're moved out of kubelet maybe
    • Conclusion: Let's roll forwards so we can look at more than just a tiny start, reconvine next week.

Oct 04

  • CRI status updates
  • rktlet demo
  • KubeCon sig-node F2F tentaively planned for 11/7. Kindly respond to the existing thread if you can make it. Video Conferencing can be setup for remote attendees.
  • Node-e2e; should it use kube-up-like environments instead of what it has now?
    • Provides benefit in that you're testing a more "production" environment
    • Could at least share the startup scripts fairly easily
    • If we had a standardized "node-setup" shared piece at the bottom, then we could more easily test/validate by knowing there's this one way.
    • Current kube-up setup has duplicated code and burdon and it makes it tricky to claim x distro is supported. Goal is to make it easier to add new distros and maintain better.
    • Share more code at the node level for setup. Document the exact steps needed for setup in general.

Sept 27

Sept 20

Sept 13

Sept 06

  • Pod Container Status ImageID (derekwaynecarr DirectXMan12) https://github.com/kubernetes/kubernetes/issues/31621
    • Right now we run by tag / sha. When you run by tag, you lose auditability. There's a local image id, but that image id is not the content-addressable ID
    • ContainerStatus ImageID is thus not really actually useful. At all.
    • Should users be able to determine exactly the image running? A) obviously yes, but you can't now
    • History: Why do we have ImageID?
      • Based on the 'ssh into a node, docker ps' stuff mostly
    • Docker does not provide the sha when you pull by tags in a structured way, just a status text (what you see printed when you docker pull foo:tag)
    • Docker does not always populate repo digest
    • Possible solution: Have kubelet resolve tag -> digest itself
      • Downside, kubelet is now relying less on docker and has to do more
    • Maybe an admission controller could translate?
    • What do we do with the existing field? Do we replace it with hash, or do we have to add a new field?
    • Should it be lockstep across all nodes; resolve it before hitting kubelet?
      • I don't think we can because of RestartPolicy + ImagePullPolicy interactions. Unrelated.
    • This issue is just informational; ContainerStatus tells the truth of the current running reality, no more, no less, no other changes.
    • Discuss on the issue
  • User-namespace
    • https://github.com/kubernetes/kubernetes/pull/30684
    • Was not included in 1.4 as a "late exception"
    • Discussion about whether this should be done differently, what the process really should be..
    • TODO, we have some idea of what it should be, @derek to writeup / provide a link
    • More inclusion of the community would help in terms of transparency as well
    • Push for userns in 1.5 :)
  • Pre-review of node-team 1.5 features
  • Node-e2e flakes a bit more on CoreOS than other images
    • Should we be running these tests for all distros? Is there really value in blocking per-pr?
    • AI: dawnchen: file the issue to carry the discussion
  • Rktnetes sync notes:
    • Moving to rktlet repo and re-vendoring to kubelet
    • WIP CRI implementation on rkt side
    • Encourage creating issues about rktnetes in rktlet repo
      • We recognize issues will still be filed against the main repo, and we don't want to proactively move issues because github doesn't support that, but if we can start with them there, that's easier to triage for us.

August 30

August 23 -- Cancelled

  • Node conformance test suite
  • Minukube + rktnetes, demo @sur (move to next week as well)

August 16

  • Node performance benchmark (zhoufang)
    • Slides link: https://docs.google.com/presentation/d/1pYNnKo7OF-IHOwnSJ1hZvKmEwzEKc2y3IoZIOjYcDK4/edit
    • Do we have a repo / issue / PR we can follow for running this on our own?
      • Not yet, more stuff needs to be merged and so on first. TODO, make and link a tracking issue (maybe a future sig-node)
    • Will this be integrated with the existing testing dashboards / gubernator stuff?
      • For now pushed to GCS, in the future talk to infra team
    • Is there an option to push this to prometheus? Other sigs have made it possible to support e2e -> prometheus pushes
      • Not right now? We don't have a real answer right now
    • Long term, we should be alerting when there are regressions shown in this, it will be automatically run
    • How does this work?
      • Standalone cadvisor
      • It should already be mostly runtime agnostic
    • Yifan and Zhou to sync on how to run this for rktnetes to get pretty results there :)
  • CRI: Is there a way to show a runtime's cgroup?
    • Not at this moment afaik; it would make sense to add it as part of the 'Version' api
    • Should we add a cgroup path for the runtime's cgroup(s?) to the kubelet?
      • Vish: The kubelet cares about pods and the node as a whole, why do we care about this?
    • Derek: Open an issue for this, more discussion https://github.com/kubernetes/kubernetes/issues/30702
  • CRI attach/exec/portforward
  • CRI area owners
    • Dawn brought up that we should discuss area owners to help drive the progress in individual areas.
  • Could we add GPU discovery/assignment capabilities to kubelet in v1.5?
  • Should we use annotations to expose sorta security related pod features, specifically sysctls
    • Sysctls: https://github.com/kubernetes/kubernetes/pull/26057
    • Current proposal is first-class, not annotations. Concerns around annotations, security…
    • Vish: Why is validation a concern for an alpha opt-in feature?
    • (ref app-armor, also annotations)
    • Maybe have a node-whitelist that is enforced on the kubelet layer. Kinda messy UX, but it should resolve these security concerns
    • Do we need to have scheduling decisions?
      • We can use taints-tolerances
    • Derek: What are the next steps for sysctls? We already know what's accounted / not accounting, how do we decide a whitelist.
      • Vish: Proposal didn't make this clear enough; it needs the information in a better form.
      • In the 1.4 timeline: Start with a node whitelist (default empty list of sysctls), and then on a per kubelet basis people can choose what they're okay with
      • Vish: Comment something to the above's effect on the sysctl PR
  • Pod level cgroups: will they land in 1.4 timeline?
    • Dawn has it marked for 1.5 as a p0.
    • Vish: Probably won't happen this week
    • Action Item (?): Disable flags for 1.4 if it's not making it in
  • UsernsMode

August 9

  • Add a k8s repo for rkt/CRI integration ? (rktlet)
  • Add support for UsernsMode?
    • Redhat wants it for 1.4 and is willing to do the work for it
    • No need to change the default, just make it possible
    • Willing to also do podsecuritycontext stuff so administrators can control it correctly
    • Dawn: Some concerns about whether we can finish it in time since we have a "freeze" coming up so soon. We do want the feature though (👍), but we need to follow the rules.
    • Some technical/semantic issues with userNS since it will break some existing kubernetes features (other namespacing, mounts of volumes owned by root, etc, kube tools?)
    • Proposal incoming for future discussion with use case
  • Docker validation updates
    • No manual validation this time. We only have automated docker validation result, it only runs on GCI.
    • Current test result:
      • Functionality: We are running node e2e in automated docker validation, currently all green against docker 1.12.0.
      • Performance: Not yet. We will run kubelet performance test in node e2e against docker 1.12.0 soon.
    • Like previous release, support multiple docker versions still, and documented the known issues
  • 1.4 feature updates
  • So far we've improved node-e2e in various ways. * Next step, maybe not for 1.4, package conformance tests as a container image that can run all the tests against that node * Substep 1: static binary * Substep 2: docker image
  • Status on kubelet/Docker CRI integration
    • In progress: added a kuberuntime package on the kubelet side to use CRI.
      • Not tied to any release, WIP still..
    • Added a shim for docker to implement CRI. Currently this supports only basic features with little test/validation. This is blocked on the kuberuntime implementation for a more complete validtion.
    • Other parts of CRI are still under discussion (e.g.., exec with terminal resizing, metrics, logging, etc)
  • Follow-up on new repo in Kubernetes org for node feature discovery
    • https://github.com/kubernetes/kubernetes/issues/28311
    • Process, how do you get an actual conclusion at the end or ownership of the new repo etc. The decision is still not well defined, and the procedure still needs help
    • Should this be part of the k8s-community meeting or k8s-dev mailing list rather than sig-node?
  • Very short update on sysctls (sttts):
    • "the table": kernel code-level analysis of sysctls on proposed whitelist: bd832c9879/docs/proposals/sysctl.md (summary)
    • kmem accounting for ipc works until kernel 4.4
    • broken since 4.5 due to switch to opt-in; probably simple fixes
  • Remove some kubelet dependencies (dims) (PR's for pidof, brctl, pkill) - Do we want to do some of the cleanup for 1.4?

August 2

  • Discuss and get feedback on adding snap as a supported datasource to heapster
    • There's been discussion of splitting metrics out into "core" and "monitoring"/additional ones
    • Core ones should be consistent, well understood, defined strongly by kubelet probably
    • Heapster currently does both core and monitoring. If snap is meant to be in addition to "core" metrics, then that's great, if it's meant to also replace "core" then it needs to be a more involved process.
  • Proactively reclaiming node resources, possibly with an option of administrator provided scripts https://github.com/kubernetes/features/issues/39#issuecomment-235069913
  • Discuss sysctl proposal questions and kmem (sttts and derekwaynecarr) https://github.com/kubernetes/kubernetes/pull/26057#issuecomment-236574813
    • We can expose sysctls as knobs so long as they're properly accounted for in the memcg
    • Argument for "unlimited" other than the memcg limits for some of them (e.g. tcp buffers)
      • Potential issues if applications change its behavior based on specific sysctl values
    • Further experiment that the whitelist of sysctls are all "safe" to increase; they're all namespaced, but all they all resource-isolated (accounted).
    • Separate discussion:
      • Node defaults for sysctls
  • Brief follow up on new repo in Kubernetes org for node feature discovery (Connor/Intel, cross-posted to sig-node from dev).
  • Reminder: Expecting code/feature freeze in 3 weeks for v1.4. Bug-fixes and additions to 1.4 features will be accepted beyond that cutoff, but not new feature impls.

July 26

Agenda:

July 19

Aside:

We need some container runtime interface implementation term that is prounancable.
  • oclet (awk-let) rklet (rock-let) docklet (dock-let)

Action:

  • Feature owners: File a feature in the feature repository by Friday which at least has a good title
  • Paul: Tracking issue for kubelet cleaning/refactoring

July 12

July 5

Cancelled today.

June 28

Cancelled today.

June 21

June 14

  • Cancelled

June 7

May 31

Docker v1.10 is having performance issues. @timothysc https://github.com/kubernetes/kubernetes/issues/19720

Derek Carr is working on updating systemd config. Needs help with update existing CoreOS images.

Node e2e has ability to pass ginkgo flags. Tests that fail on RHEL can be blacklisted using ginkgo skip flags.

Container Runtime Interface Discussion - How to make progress? Vendors need to be unblocked. - https://github.com/kubernetes/kubernetes/pull/25899

  • How will new features be implemented given the pending refactoring? Dockertools package will be moved to use the new interface.
  • New runtimes will not be part of the main kubernetes repo
  • Yu-Ju will be working on re-factoring the docker runtime package to work against the new API.
  • Euan to review the runtime proposal today and provide his feedback. There hasn't been any other concerns from the community as of now

May 24

May 17

May 10

May 3

April 26

  • New ContainerRuntime interface discussion

    • Difficult integrate at Pod level declarative interface.
    • Expect imperative container level interface, and OCI compatible in a long run
    • Proposed 2 options:
        1. introduce OCI compatible interfaces now
        1. introduce docker like container level interface first
    • AI: yuju write a high level design doc and continue discussing next week
  • Node on Windows - initial investigation / next steps

  • Followup with custom metrics discussion

    • AI: agoldste@ from Redhat is going to write a high-level requirement doc to share with us. A separate VC to continue the discussion.
  • rktnetes status updates:

  • NVIDIA GPU support:

    • Kubelet:

      The kubelet should have "--device" options, and volumes for GPU, NVIDIA GPU is part of it, and also have the interface to support other GPUs.

      Cadvisor include the NVML libs to find NVIDIA GPUs on host, then kubelet could send GPU information to kube-scheduler.

  • Kube-scheduler:

    • Include GPU information:

      Number: // how many GPUs needed by the container.

      Vendor: // So far, only NVIDIA GPU.

      Library version: // run the container on the right host, not just a host with GPUs.

April 19

  • InitContainer proposal discussion: (#23666)
  • Demo: Out-of-resource eviction?
  • rktnetes status updates:
  • cAdvisor roadmap updates:
    • punt standalone cAdvisor
    • cAdvisor validation and testing
  • announcement: minikube on local host

April 12

April 4

Mar 29

  • Kubelet issue with termination - issue with watch: https://github.com/openshift/origin/issues/8176
  • Yifan looking at issues with rkt pod startup latency
  • State of cAdvisor refactoring
    • Tim St. Clair mentioned that no changes inside kubelet in the near term.
  • Systemd driver in Kubelet - Does it need first class support?
    • Probably not. We need more information to discuss further
  • Kubelet bootstrapping
    • GKE will not switch, so there is no immediate changes required for other providers.
  • Kubelet CI
    • Kubelet Node e2e is now a merge blocker
    • Hackathon next week - will be open to the entire SIG. Goal is to increase test coverage
    • How do we add a test to the Conformance suite?
      • It needs to include only the core kubelet features. But the Kubelet API is expected to be portable.
      • Some tests have failed on Systemd in the past like Filesystem accounting
      • Some of the kubelet API is distro dependent.
      • Why/When write a node e2e test?
        • Any node level feature that can be tested on a single host.
        • Simpler test infrastructure. Easier to test node conformance
        • Smaller scope
      • We need multiple deployment configurations for the e2e suite and have the tests be able to discover these configurations.
  • Increase maximum pods per node for kube-1.3 release #23349

Mar 22

Mar 15

Mar 8

Mar 1

Node e2e tests

  • Run against PRs using the trigger phrase "@k8s-bot test node e2e experimental"
  • Run locally using "make test_e2e_node"
  • Tests can be built to tar.gz and copied to arbitrary host
  • Would like to distribute testing of distros and setup dashboard to publish results

rktnetes status:

Feb. 23

Docker v1.10 integration status

  • Lantao has fixed all the issues that has been identified. Docker go client had an issue that has been fixed. Docker startup scripts have been updated. A PR fixing -d option is here https://github.com/kubernetes/kubernetes/pull/20281

  • Prior to upgrade we'll need a daemonset pod to migrate the images.

  • Vishh is working on disabling seccomp profiles by default.

Yifan and Shaya status updates:

CNI plugin support for rkt

TODO: vishh to post documentation on patch policy for releases.

v1.2 will support 100 pods by default.

Docker v1.9 validation is having issues. We are performing a controlled restart of the docker daemon to try and mitigate the issue.

Feb. 16

  • Projects and priorities brainstorm
    • Refactor kubelet: Too complex for new developer, we should refactor the code. Better separation and architecture is needed.
      • Dawn: One important thing is to cleanup container runtime and image management interface. Maybe separate pod level and runtime level api.
      • Tim works on cleanup cadvisor kubelet interface.
      • Should have a sig-meeting soon.
      • @Dawn will file an issue about this.
    • Better Disk management (disk resource quota mgmt)
    • Performance requirement for 1.3
    • Kubelet machine health API
      • Kubelet should provide api to receive categorized machine problem from "Machine Doctors" such as machine monitor, kernel monitor etc.
      • Some existing systems such as Ganglia https://github.com/kubernetes/kubernetes/issues/21333
      • Who should take actions: Kubelet? Application? Operator?
      • Use DaemonSet to handle it and mark kubelet as NotReady?
    • Determine if we should support cpuset-cpus and cpuset-mem: https://github.com/kubernetes/kubernetes/issues/10570
    • Arbitrary tar file?

Feb. 10

Feb. 3

Jan. 27

Jan. 20

  • OCI meeting updates
  • rkt / CoreOS integration status updates
  • 1.2 features status recap

Jan. 13

  • Node scalability issue recap. Discussed via issues and PRs
  • systemd spec https://github.com/kubernetes/kubernetes/pull/17688 ready for implementation. Minor discussion on the cgroup library problems in libcontainer.
  • OCI meeting is going on, will have more updates in next sig-node sync

Jan. 6