adding contributor summit notes

This commit is contained in:
parispittman 2018-05-14 08:55:53 -07:00
parent 643702eb4b
commit 0761a866ee
5 changed files with 436 additions and 0 deletions

View File

@ -0,0 +1,139 @@
# Client-go
**Lead:** munnerz with assist from lavalamp
**Slides:** combined with the CRD session [here](https://www.dropbox.com/s/n2fczhlbnoabug0/API%20extensions%20contributor%20summit.pdf?dl=0) (CRD is first; client-go is after)
**Thanks to our notetakers:** kragniz, mrbobbytales, directxman12, onyiny-ang
## Goals for the Session
* What is currently painful when building a controller
* Questions around best practices
* As someone new:
* What is hard to grasp?
* As someone experienced:
* What important bits of info do you think are critical
## Pain points when building controller
* A lot of boilerplate
* Work queues
* HasSynced functions
* Re-queuing
* Lack of deep documentation in these areas
* Some documentation exists, bot focused on k/k core
* Securing webhooks & APIServers
* Validation schemas
* TLS, the number of certs is a pain point
* It is hard right now, the internal k8s CA has been used a bit.
* OpenShift has a 'serving cert controller' that will generate a cert based on an annotation that might be able to possibly integrate upstream.
* Election has been problematic and the Scaling API is low-level and hard to use. doesn't work well if resource has multiple meanings of scale (eg multiple pools of nodes)
* Registering CRDs, what's the best way to go about it?
* No best way to do it, but has been deployed with application
* Personally, deploy the CRDs first for RBAC reasons
* Declarative API on one end that has to be translated to translated to a transactional API on the other end (e.g. ingress). Controller trying to change quite a few things.
* You can do locking, but it has to be built.
* Q: how do you deal with "rolling back" if the underlying infrastructure
that you're describing says no on an operation?
* A: use validating webhook?
* A: use status to keep track of things?
* A: two types of controllers: `kube --> kube` and `kube --> external`,
they work differently
* A: Need a record that keeps track of things in progress. e.g. status. Need more info on how to properly tackle this problem.
## Best practices
(discussion may be shown by Q: for question or A: for audience or answer)
* How do you keep external resources up to date with Kubernetes resources?
* A: the original intention was to use the sync period on the controller if
you watch external resources, use that
* Should you set resync period to never if you're not dealing with
external resources?
* A: Yes, it's not a bug if watch fails to deliver things right
* A: controller automatically relists on connection issues, resync
interval is *only* for external resources
* maybe should be renamed to make it clear it's for external resources
* how many times to update status per sync?
* A: use status conditions to communicate "fluffy" status to user
(messages, what might be blocked, etc, in HPA), use fields to
communicate "crunchy" status (last numbers we saw, last metrics, state
I need later).
* How do I generate nice docs (markdown instead of swagger)
* A: kubebuilder (kubernetes-sigs/kubebuilder) generates docs out of the
box
* A: Want to have IDL pipeline that runs on native types to run on CRDs,
run on docs generator
* Conditions vs fields
* used to check a pods state
* "don't use conditions too much"; other features require the use of conditions, status is unsure
* What does condition mean in this context
* Additional fields that can have `ready` with a msg, represents `state`.
* Limit on states that the object can be in.
* Use conditions to reflect the state of the world, is something blocked etc.
* Conditions were created to allow for mixed mode of clients, old clients can ignore some conditions while new clients can follow them. Designed to make it easier to extend status without breaking clients.
* Validating webhooks vs OpenAPI schema
* Can we write a test that spins up main API server in process?
* Can do that current in some k/k tests, but not easy to consume
* vendoring is hard
* Currently have a bug where you have to serve aggregated APIs on 443,
so that might complicate things
* How are people testing extensions?
* Anyone reusing upstream dind cluster?
* People looking for a good way to test them.
* kube-builder uses the sig-testing framework to bring up a local control plane and use that to test against. (@pwittrock)
* How do you start cluster for e2es?
* Spin up a full cluster with kubeadm and run tests against that
* integration tests -- pull in packages that will build the clusters
* Q: what CIs are you using?
* A: Circle CI and then spin up new VMs to host cluster
* Mirtantis has a tool for a multi-node dind cluster for testing
* #testing-commons channel on stack. 27 page document on this--link will be put in slides
* Deploying and managing Validating/Mutating webhooks?
* how complex should they be?
* When to use subresources?
* Are people switching to api agg to use this today?
* Really just for status and scale
* Why not use subresources today with scale?
* multiple replicas fields
* doesn't fit polymorphic structure that exists
* pwittrock@: kubectl side, scale
* want to push special kubectl verbs into subresources to make kubectl
more tolerant to version skew
## Other Questions
* Q: Client-go generated listers, what is the reason for two separate interfaces to retrieve from client and cache?
* A: historical, but some things are better done local vs on the server.
* issues: client-set interface allows you to pass special options that allow you to do interesting stuff on the API server which isn't necessarily possible in the lister.
* started as same function call and then diverged
* lister gives you slice of pointers
* clientset gives you a slice of not pointers
* a lot of people would take return from clientset and then convert it to a slice of pointers so the listers helped avoid having to do deep copies every time. TLDR: interfaces are not identical
* Where should questions go on this topic for now?
* A: most goes to sig-api-machinery right now
* A : Controller related stuff would probably be best for sig-apps
* Q: Staleness of data, how are people dealing with keeping data up to date with external data?
* A: Specify sync period on your informer, will put everything through the loop and hit external resources.
* Q: With strictly kubernetes resources, should your sync period be never? aka does the watch return everything.
* A: The watch should return everything and should be used if its strictly k8s in and k8s out, no need to set the sync period.
* Q: What about controllers in other languages than go?
* A: [metacontroller](https://github.com/GoogleCloudPlatform/metacontroller) There are client libs in other languages, missing piece is work queue,
informer, etc
* Cluster API controllers cluster, machineset, deployment, have a copy of
deployment code for machines. Can we move this code into a library?
* A: it's a lot of work, someone needs to do it
* A: Janet Kuo is a good person to talk to (worked on getting core workloads
API to GA) about opinions on all of this
* Node name duplication caused issues with AWS and long-term caches
* make sure to store UIDs if you cache across reboot
## Moving Forwards
* How do share/disseminate knowledge (SIG PlatformDev?)
* Most SIGs maintain their own controllers
* Wiki? Developer Docs working group?
* Existing docs focus on in-tree development. Dedicated 'extending kubernetes' section?
* Git-book being developed for kubebuilder (book.kubebuilder.io); would appreciate feedback @pwittrock
* API extensions authors meetups?
* How do we communicate this knowledge for core kubernetes controllers
* Current-day: code review, hallway conversations
* Working group for platform development kit?
* Q: where should we discuss/have real time conversations?
* A: #sig-apimachinery, or maybe #sig-apps in slack (or mailing lists) for the workloads controllers

View File

@ -0,0 +1,92 @@
# CRDs - future and painpoints
**Lead:** sttts
**Slides:** combined with the client-go session [here](https://www.dropbox.com/s/n2fczhlbnoabug0/API%20extensions%20contributor%20summit.pdf?dl=0)
**Thanks to our notetakers:** mrbobbytales, kragniz, tpepper, and onyiny-ang
## outlook - aggregation
* API stable since 1.10. There is a lack of tools and library support.
* GSoC project with @xmudrii: share etcd storage
* `kubectl create etcdstorage your api-server`
* Store custom data in etcd
## outlook custom resources
1.11:
* alpha: multiplier versions with/without conversion
* alpha: pruning - blocker for GA - unspecified fields are removed
* deep change of semantics of custom resources
* from JSON blob store to schema based storage
* alpha: defaulting - defaults from openapi validation schema are applied
* alpha: graceful deletion - (maybe? PR exists)
* alpha: server side printing columns for `kubectl get` customization
* beta: subresources - alpha in 1.10
* will have additionalProperties with extensible string map
* mutually exclusive with properties
1.12
* multiple versions with declarative field renames
* strict create mode (issue #5889)
Missing from Roadmap:
- Additional Properties: Forbid additional fields
- Unknown fields are silently dropped instead of erroring
- Istio used CRD extensively: proto requires some kind of verification and CRDs are JSON
- currently planning to go to GA without proto support
- possibly in the longer term to plan
- Resource Quotas for Custom Resources
- doable, we know how but not currently implemented
- Defaulting: mutating webhook will default things when they are written
- Is Validation going to be required in the future
- poll the audience!
- gauging general sense of validation requirements (who wants them, what's missing?)
- missing: references to core types aren't allowed/can't be defined -- this can lead to versioning complications
- limit CRDs clusterwide such that the don't affect all namespaces
- no good discussion about how to improve this yet
- feel free to start one!
- Server side printing columns, per resource type needs to come from server -- client could be in different version than server and highlight wrong columns
Autoscaling is alpha today hopefully beta in 1.11
## The Future: Versioning
* Most asked feature, coming..but slowly
* two types, "noConversion" and "Declarative Conversion"
* "NoConversion" versioning
* maybe in 1.11
* ONLY change is apiGroup
* Run multiple versions at the same time, they are not converted
* "Declarative Conversion" 1.12
* declarative rename e.g
```
spec:
group: kubecon.io
version: v1
conversions:
declarative:
renames:
from: v1pha1
to: v1
old: spec.foo
new: bar
```
* Support for webhook?
* not currently, very hard to implement
* complex problem for end user
* current need is really only changing for single fields
* Trying to avoid complexity by adding a lot of conversions
## Questions:
* When should someone move to their own API Server
* At the moment, telling people to start with CRDs. If you need an aggregated API server for custom versioning or other specific use-cases.
* How do I update everything to a new object version?
* Have to touch every object.
* are protobuf support in the future?
* possibly, likely yes
* update on resource quotas for CRDs
* PoC PR current out, it's doable just not quite done
* Is validation field going to be required?
* Eventually, yes? Some work being done to make CRDs work well with `kubectl apply`
* Can CRDs be cluster wide but viewable to only some users.
* It's been discussed, but hasn't been tackled.
* Is there support for CRDs in kubectl output?
* server side printing columns will make things easier for client tooling output. Versioning is important for client vs server versioning.

View File

@ -0,0 +1,63 @@
# Developer Tools:
**Leads:** errordeveloper, r2d4
**Slides:** n/a
**Thanks to our notetakers:** mrbobbytales, onyiny-ang
What APIs should we target, what parts of the developer workflow haven't been covered yet?
* Do you think the Developer tools for Kubernetes is a solved problem?
* A: No
### Long form responses from SIG Apps survey
* Need to talk about developer experience
* Kubernetes Community can do a lot more in helping evangelize Software development workflow, including CI/CD. Just expecting some guidelines on the more productive ways to write software that runs in k8s.
* Although my sentiment is neutral on kube, it is getting better as more tools are emerging to allow my devs to stick to app development and not get distracted by kube items. There is a lot of tooling available which is a dual edge sword, these tools range greatly in usability robustness and security. So it takes a lot of effort to...
### Current State of Developer Experience
* Many Tools
* Mostly incompatible
* Few end-to-end workflows
### Comments and Questions
* Idea from scaffold to normalize the interface for builders, be able to swap them out behind the scenes.
* Possible to formalize these as CRDs?
* Lots of choices, helm, other templating, kompose etc..
* So much flexibility in the Kubernetes API that it can become complicated for new developers coming up.
* Debug containers might make things easier for developers to work through building and troubleshooting their app.
* Domains and workflow are so different from companies that everyone has their own opinionated solution.
* Lots of work being done in the app def working group to define what an app is.
* app CRD work should make things easier for developers.
* Break out developer workflow into stages and try and work through expanding them, e.g. develop/debug
* debug containers are looking to be used both in prod and developer workflows
* Tool in sig-cli called kustomize, was previously 'konflate'?
* Hard to talk about all these topics as there isn't the language to talk about these classes of tools.
* @jacob investigation into application definition: re: phases, its not just build, deploy, debug, its build, deploy, lifecycle, debug. Managing lifecycle is still a problem, '1-click deploy' doesn't handle lifecycle.
* @Bryan Liles: thoughts about why this is hard:
* kubectl helm apply objects in different orders
* objects vs abstractions
* some people love [ksonnet](https://ksonnet.io/), some hate it. Kubernetes concepts are introduced differently to different people so not everyone is starting with the same base. Thus, some tools are harder for some people to grasp than others. Shout out to everyone who's trying to work through it * Being tied to one tool breaks compatibility across providers.
* Debug containers are great for break-glass scenarios
* CoreOS had an operator that handled the entire stack, additional objects could be created and certain metrics attached.
* Everything is open source now, etcd, prometheus operator
* Tools are applying things in different orders, and this can be a problem across tooling
* People who depend on startup order also tend to have reliability problems as they have their own operational problems, should try and engineer around it.
* Can be hard if going crazy on high-level abstractions, can make things overly complicated and there are a slew of constraints in play.
* Ordering constraints are needed for certain garbage collection tasks, having ordering may actually be useful.
* Some groups have avoided high-level DSLs because people should understand readiness/livelness probes etc. Developers may have a learning curve, but worthwhile when troubleshooting and getting into the weeds.
* Lots of people don't want to get into it at all, they want to put in a few details on a db etc and get it.
* Maybe standardize on a set of labels to on things that should be managed as a group. Helm is one implementation, it should go beyond helm.
* There is a PR that is out there that might take care of some of this.
* Everyone has their own "style" when it comes to this space.
* Break the phases and components in the development and deployment workflow into sub-problems and they may be able to actually be tackled. Right now the community seems to tackling everything at once and developing different tools to do the same thing.
* build UI that displays the whole thing as a list and allows easy creation/destruction of cluster
* avoid tools that would prevent portability
* objects rendered to file somehow: happens at runtime, additional operator that takes care of the sack
* 3, 4 minor upgrades without breakage
* @Daniel Smith: start up order problems = probably bigger problems, order shouldn't need to matter but in the real world sometimes it does
* platform team, internal paths team (TSL like theme), etc. In some cases it's best to go crazy focusing on the abstractions--whole lot of plumbing that needs to happen to get everything working properly
* Well defined order of creation may not be a bad thing. ie. ensure objects aren't created that are immediately garbage collected.
* Taking a step back from being contributors and put on developer hats to consider the tool sprawl that exists and is not necessarily compatible across different aspects of kubernetes. Is there anyway to consolidate them and make them more standardized?
* Split into sub-problems
## How can we get involved?
- SIG-Apps - join the conversation on slack, mailing list, or weekly Monday meeting

View File

@ -0,0 +1,129 @@
# Networking
**Lead:** thockin
**Slides:** [here](https://docs.google.com/presentation/d/1Qb2fbyTClpl-_DYJtNSReIllhetlOSxFWYei4Zt0qFU/edit#slide=id.g2264d16f0b_0_14)
**Thanks to our notetakers:** onyiny-ang, mrbobbytales, tpepper
This session is not declaring what's being implemented next, but rather laying out the problems that loom.
## Coming soon
- kube-proxy with IPVS
- currently beta
- core DNS replacing kube DNS
- currently beta
- pod "ready++"
- allow external systems to participate in rolling updates. Say your load-balancer takes 5-10 seconds to program, when you bring up new pod and take down old pod the load balancer has lost old backends but hasn't yet added new backends. The external dependency like this becomes a gating pod decorator.
- adds configuration to pod to easily verify readiness
- design agreed upon, alpha (maybe) in 1.11
## Ingress
* The lowest common-denominator API. This is really limiting for users, especially compared to modern software L7 proxies.
* annotation model of markup limits portability
* ingress survey reports:
* people want portability
* everyone uses non-portable features…
* 2018 L7 requirements are dramatically higher than what they were and many vendors dont support that level of functionality.
* Possible Solution? Routes
* openshift uses routes
* heptio prototyping routes currently
* All things considered, requirements are driving it closer and closer to istio
Possibility, poach some of the ideas and add them to kubernetes native.
## Istio
(as a potential solution)
- maturing rapidly with good APIs and support
- Given that plus istio is not part of kubernetes, it's unlikely near term to become a default or required part of a k8s deployment. The general ideas around istio style service mesh could be more native in k8s.
## Topology and node-local Services
- demand for node-local network and service discovery but how to go about it?
- e.g. “I want to talk to the logging daemon on my current host”
- special-case topology?
- client-side choice
- These types of services should not be a service proper.
## Multi-network
- certain scenarios demand multi-network
- A pod can be in multiple networks at once. You might have different quality of service on different networks (eg: fast/expensive, slower/cheaper), or different connectivity (eg: the rack-internal network).
- Tackling scenarios like NFV
- need deeper changes like multiple pod IPs but also need to avoid repeating old mistakes
- SIG-Network WG designing a PoC -- If interested jump on SIG-network WG weekly call
- Q: Would this PoC help if virtual-kubelets were used to span cloud providers? Spanning latency domains in networks is also complicated. Many parts of k8s are chatty, assuming a cluster internal low-latency connectivity.
## Net Plugins vs Device Plugins
- These plugins do not coordinate today and are difficult to work around
- gpu that is also an infiniband device
- causes problems because network and device are very different with verbs etc
- problems encountered with having to schedule devices and network together at the same time.
“I want a gpu on this host that has a gpu attached and I want it to be the same deviec”
PoC available to make this work, but its rough and a problem right now.
- Resources WG and networking SIG are discussing this challenging problem
- SIGs/WGs. Conversation may feel like a cycle, but @thockin feels it is a spiral that is slowly converging and he has a doc he can share covering the evolving thinking.
## Net Plugins, gRPC, Services
- tighter coupling between netplugins and kube-proxy could be useful
- grpc is awesome for plugins, why not use a grpc network plugin
- pass services to network plugin to bypass kube-proxy, give more awareness to the network plugin and enable more functionality.
## IPv6
- beta but **no** support for dual-stack (v4 & v6 at the same time)
- Need deeper changes like multiple pod IPs (need to change the pod API--see Multi-network)
- https://github.com/kubernetes/features/issues/563
## Services v3
- Services + Endpoints have a grab-bag of features which is not ideal; "grew organically"
- Need to start segmenting the "core" API group
- write API in a way that is more obvious
- split things out and reflect it in API
- Opportunity to rethink and refactor:
- Endpoints -> Endpoint?
- split the grouping construct from the “gazintas”
- virtualIP, network, dns name moves into the service
- EOL troublesome features
- port remapping
## DNS Reboot
- We abuse DNS and mess up our DNS schema
- it's possible to write queries in DNS that take over names
- @thockin has a doc with more information about the details of this
- Why can't I use more than 6 web domains? bugzilla circa 1996
- problem: its possible to write queries in dns that write over names
- create a namespace called “com” and an app named “google” and itll cause a problem
- “svc” is an artifact and should not be a part of dns
- issues with certain underlying libraries
- Changing it is hard (if we care about compatibility)
- Can we fix DNS spec or use "enlightened" DNS servers
- Smart proxies on behalf of pods that do the searching and become a “better” dns
- External DNS
- Creates DNS entries in external system (route53)
- Currently in incubator, not sure on status, possibly might move out of incubator, but unsure on path forward
Perf and Scalability
- iptables is krufty. nftables implementation should be better.
- ebpf implementation (eg; Cilium) has potential
## Questions:
- Consistent mechanism to continue progress but maintain backwards compatibility
- External DNS was not mentioned -- blue/green traffic switching
- synchronizes kubernetes resources into various Kubernetes services
- it's in incubator right now (deprecated)
- unsure of the future trajectory
- widely used in production
- relies sometimes on annotations and ingress
- Q: Device plugins. . .spiraling around and hoping for eventual convergence/simplification
- A: Resource management on device/net plugin, feels like things are going in a spiral, but progress is being made, it is a very difficult problem and hard to keep all design points tracked. Trying to come to consensus on it all.
- Q: Would CoreDNS be the best place for the plugins and other modes for DNS proxy etc.
- loss of packets are a problem -- long tail of latency
- encourage cloud providers to support gRPC
- Q: With the issues talked about earlier, why cant istio be integrated natively?
- A: Istio can't be required/default: still green
- today we can't proclaim that Kubernetes must support Istio
- probably not enough community support this year (not everyone is using it at this point)
- Q: Thoughts on k8s v2?
- A: Things will not just be turned off, things must be phased out and over the course of years, especially for services which have been core for some time.
## Take Aways:
- This is not a comprehensive list of everything that is up and coming
- A lot of work went into all of these projects

View File

@ -0,0 +1,13 @@
# Steering Committee Update
**Leads:** pwittrock, timothysc
**Thanks to our notetaker:** tpepper
* incubation is deprecated, "associated" projects are a thing
* WG are horizontal across SIGs and are ephemeral. Subprojects own a piece
of code and relate to a SIG. Example: SIG-Cluster-Lifecycle with
kubeadm, kops, etc. under it.
* SIG charters: PR a proposed new SIG with the draft charter. Discussion
can then happen on GitHub around the evolving charter. This is cleaner
and more efficient than discussing on mailing list.
* K8s values doc updated by Sarah Novotny
* changes to voting roles and rules are in the works