adding historical notes for 2014 contributor summit
This commit is contained in:
parent
f8c08b18ff
commit
6f726048ae
|
@ -0,0 +1,200 @@
|
|||
# Kubernetes Contributor Conference, 2014-12-03 to 12-05
|
||||
**Full notes:** (Has pictures; Shared with k-dev mailing list) (https://docs.google.com/document/d/1cQLY9yeFgxlr_SRgaBZYGcJ4UtNhLAjJNwJa8424JMA/edit?usp=sharing)
|
||||
**Organizers:** thockin and bburns
|
||||
**26 Attendees from:** Google, Red Hat, CoreOS, Box
|
||||
**This is a historical document. No typo or grammar correction PRs needed.**
|
||||
|
||||
Last modified: Dec. 8. 2014
|
||||
|
||||
# Clustering and Cluster Formation
|
||||
Goal: Decide how clusters should be formed and resized over time
|
||||
Models for building clusters
|
||||
* Master in charge - asset DB
|
||||
Dynamic join - ask to join
|
||||
* How Kelsey Hightower has seen this done on bare metal
|
||||
Use Fleet as a machine database
|
||||
A Fleet agent is run on each node
|
||||
Each node registers its information in etcd when it comes up
|
||||
Only security is that etcd expects the node to have a cert signed by a specific CA
|
||||
Run an etcd proxy on each node
|
||||
Don't run any salt scripts, everything is declarative
|
||||
Just put a daemon (kube-register) on a machine to become part of the cluster
|
||||
brendanburns: basically using Fleet as the cloud provider
|
||||
* Puppet model - whitelist some cert and/or subnet that you want to trust everything in
|
||||
One problem - if CA leaks, have to replace certs on all nodes
|
||||
* briangrant: we may want to support adding nodes that aren't trusted, only scheduling work from the nodes' owner on them
|
||||
* lavalamp: we need to differentiate between node states:
|
||||
In the cluster
|
||||
Ready to accept work
|
||||
Trusted to accept work
|
||||
* Proposal:
|
||||
New nodes initiate contact with the master
|
||||
Allow multiple config options for how trust can be established - IP, cert, etc.
|
||||
Each new node only needs one piece of information - how to find the master
|
||||
Can support many different auth modes - let anyone in, whitelist IPs, a particular signed cert, queue up requests for an admin to approve, etc.
|
||||
Default should be auto-register with no auth/approval needed
|
||||
Auth-ing is separate from registering
|
||||
Supporting switching between permissive and strict auth modes:
|
||||
Each node should register a public key such that if the auth mode is changed to require a cert upon registration, old nodes won't break
|
||||
kelseyhightower: let the minion do the same thing that kube-register currently does
|
||||
Separate adding a node to the cluster from declaring it as schedulable
|
||||
* Use cases:
|
||||
Kick the tires, everything should be automagic
|
||||
Professional that needs security
|
||||
* Working group for later: Joe, Kelsey, Quintin, Eric Paris
|
||||
# Usability
|
||||
* Getting started
|
||||
Want easy entry for Docker users
|
||||
Library/registry of pod templates
|
||||
* GUI - visualization of relationships and dependencies, workflows, dashboards, ways to learn, first impressions
|
||||
Will be easiest to start with a read-only UI before worrying about read-write workflows
|
||||
* Docs
|
||||
Need to refactor getting started guides so that there's one common guide
|
||||
Each cloud provider will just have its own short guide on how to create a cluster
|
||||
Need a simple test that can verify whether your cluster is healthy or diagnose why it isn't
|
||||
Make it easier to get to architecture/design doc from front page of github project
|
||||
Table of contents for docs?
|
||||
Realistic examples
|
||||
Kelsey has found that doing a tutorial of deploying with a canary helped make the value of labels clear
|
||||
* CLI
|
||||
Annoying when local auth files and config get overwritten when trying to work with multiple clusters
|
||||
Like when running e2e tests
|
||||
* Common friction points
|
||||
External IPs
|
||||
Image registry
|
||||
Secrets
|
||||
Deployment
|
||||
Stateful services
|
||||
Scheduling
|
||||
Events/status
|
||||
Log access
|
||||
* Working groups
|
||||
GUI - Jordan, Brian, Max, Satnam
|
||||
CLI - Jeff, Sam, Derek
|
||||
Docs - Proppy, Kelsey, TJ, Satnam, Jeff
|
||||
Features/Experience - Dawn, Rohit, Kelsey, Proppy, Clayton: https://docs.google.com/document/d/1hqn6FtBNMe0sThbciq2PbE_P5BONBgCzHST4gz2zj50/edit
|
||||
|
||||
|
||||
# v1beta3 discussion
|
||||
|
||||
12-04-2014
|
||||
Network -- breakout
|
||||
* Dynamic IP
|
||||
Once we support live migration, IP assigned for each POD has to move together, which might be broken the underneath.
|
||||
We don’t have introspection, which makes supporting various network topology harder.
|
||||
External IP is an important part.
|
||||
There’s a kick-the-tires mode and full-on mode (for GCE, AWS - fully featured).
|
||||
How do we select kick-the-tires ? Weave, Flannel, Calico: pick one.
|
||||
Someone does a comparison. thockin@ would like help in evaluating these tech against some benchmarks. Eric Paris can help - has a bare-metal setup. We’ll have a benchmark setup for evaluation.
|
||||
We need to have two real use-cases at least - a webserver example; can 10 pods find each other. lavalamp@ working on a test.
|
||||
If docker picks up a plugin model, we can use that.
|
||||
Cluster will be dynamically change, we need to design a flexible network plugin API to accomplish this.
|
||||
Flannel two things: network allocation through etcd and traffic routing w/ overlays. Also programs underlay networks (like GCE). Flannel will do IP allocation, not hard-coded.
|
||||
One special use case: per node, there are only 20 ips could be allocated. Scheduler might need to know the limitation: OUT-OF-IP(?)
|
||||
Different cloud providers, but OVS is a common mechanism
|
||||
We might need Network Grids at the end
|
||||
ACTIONS: better doc, test.
|
||||
* Public Services
|
||||
Hard problem: Have to scale to GCE, GCE load balancer cannot target to arbitrary IP, only can target to a VM for now.
|
||||
Until we have an external IP, you cannot build a HA public service.
|
||||
We can run Digital Ocean on top of kubernetes
|
||||
Issue: When starting a public service, there is internal IP assigned. It is accessable from node within cluster, but not from outside. Now we have a 3-tier services, how to access one service from outside The issue is how to take this internal accessible service externalized. General solution: forwarding the traffic outside to the internal IP. First action, teach kubernetes mapping.
|
||||
We need a registry of those public IPs. All traffic comes to that IP will be forwarded to proper IP internally.
|
||||
public service can register with DNS, and do a intermiddle load balancing outside cluster / kubernetes. Label query to tell the endpoint.
|
||||
K8s proxy can be L3 LB, and listen to the external IPs, it also talk to k8s service DB and find internal services; then goes to L7 LB, which could be HAP proxy, scheduled as a pod, it talks to Pods DB, find a cluster of pods to forward the traffic.
|
||||
|
||||
|
||||
|
||||
Two types of services: mapping external IPs and L3 LB to map to pods. L7 LB can access the IPs assigned to pods.
|
||||
Policy: Add more nodes, more external IPs can be used.
|
||||
Issue1: how to take external IP to map to a list of pods, L3 LB part.
|
||||
Issue2: how to slice those external IPs: general pool vs. private pools.
|
||||
* IP-per-service, visibility, segmenting
|
||||
* Scale
|
||||
* MAC
|
||||
|
||||
# Roadmap
|
||||
|
||||
* Should be driven by scenarios / use cases -- breakout
|
||||
* Storage / stateful services -- breakout
|
||||
Clustered databases / kv stores
|
||||
Mongo
|
||||
MySQL master/slave
|
||||
Cassandra
|
||||
etcd
|
||||
zookeeper
|
||||
redis
|
||||
ldap
|
||||
Alternatives
|
||||
local storage
|
||||
durable volumes
|
||||
identity associated with volumes
|
||||
lifecycle management
|
||||
network storage (ceph, nfs, gluster, hdfs)
|
||||
volume plugin
|
||||
flocker - volume migration
|
||||
“durable” data (as reliable as host)
|
||||
* Upgrading Kubernetes
|
||||
master components
|
||||
kubelets
|
||||
OS + kernel + Docker
|
||||
* Usability
|
||||
Easy cluster startup
|
||||
Minion registration
|
||||
Configuring k8s
|
||||
move away from flags in master
|
||||
node config distribution
|
||||
kubelet config
|
||||
dockercfg
|
||||
Cluster scaling
|
||||
CLI + config + deployment / rolling updates
|
||||
Selected workloads
|
||||
* Networking
|
||||
External IPs
|
||||
DNS
|
||||
Kick-the-tires networking implementation
|
||||
* Admission control not required for 1.0
|
||||
* v1 API + deprecation policy
|
||||
* Kubelet API well defined and versioned
|
||||
* Basic resource-aware scheduling -- breakout
|
||||
require limits?
|
||||
auto-sizing
|
||||
* Registry
|
||||
Predictable deployment (config-time image resolution)
|
||||
Easy code->k8s
|
||||
Simple out-of-the box setup
|
||||
One or many?
|
||||
Proxy?
|
||||
Service?
|
||||
Configurable .dockercfg
|
||||
* Productionization
|
||||
Scalability
|
||||
100 for 1.0
|
||||
1000 by summer 2015
|
||||
HA master -- not gating 1.0
|
||||
Master election
|
||||
Eliminate global in-memory state
|
||||
IP allocator
|
||||
Operations
|
||||
Sharding
|
||||
Pod getter
|
||||
Kubelets need to coast when master down
|
||||
Don’t blow away pods when master is down
|
||||
Testing
|
||||
More/better/easier E2E
|
||||
E2E integration testing w/ OpenShift
|
||||
More non-E2E integration tests
|
||||
Long-term soaking / stress test
|
||||
Backward compatibility
|
||||
Release cadence and artifacts
|
||||
Export monitoring metrics (instrumentation)
|
||||
Bounded disk space on master and kubelets
|
||||
GC of unused images
|
||||
* Docs
|
||||
Reference architecture
|
||||
* Auth[nz]
|
||||
plugins + policy
|
||||
admin
|
||||
user->master
|
||||
master component->component: localhost in 1.0
|
||||
kubelet->master
|
|
@ -1,4 +1,4 @@
|
|||
events/elections/2017/
|
||||
vendor/
|
||||
sig-contributor-experience/contribex-survey-2018.csv
|
||||
|
||||
events/2014
|
||||
|
|
Loading…
Reference in New Issue