community/events/2014/contributor-conference
parispittman 6f726048ae adding historical notes for 2014 contributor summit 2018-11-28 21:03:44 -08:00
..
README.md adding historical notes for 2014 contributor summit 2018-11-28 21:03:44 -08:00

README.md

Kubernetes Contributor Conference, 2014-12-03 to 12-05

Full notes: (Has pictures; Shared with k-dev mailing list) (https://docs.google.com/document/d/1cQLY9yeFgxlr_SRgaBZYGcJ4UtNhLAjJNwJa8424JMA/edit?usp=sharing)
Organizers: thockin and bburns
26 Attendees from: Google, Red Hat, CoreOS, Box
This is a historical document. No typo or grammar correction PRs needed.

Last modified: Dec. 8. 2014

Clustering and Cluster Formation

Goal: Decide how clusters should be formed and resized over time Models for building clusters

  • Master in charge - asset DB
    Dynamic join - ask to join
  • How Kelsey Hightower has seen this done on bare metal
    Use Fleet as a machine database
    A Fleet agent is run on each node
    Each node registers its information in etcd when it comes up
    Only security is that etcd expects the node to have a cert signed by a specific CA
    Run an etcd proxy on each node
    Don't run any salt scripts, everything is declarative
    Just put a daemon (kube-register) on a machine to become part of the cluster
    brendanburns: basically using Fleet as the cloud provider
  • Puppet model - whitelist some cert and/or subnet that you want to trust everything in
    One problem - if CA leaks, have to replace certs on all nodes
  • briangrant: we may want to support adding nodes that aren't trusted, only scheduling work from the nodes' owner on them
  • lavalamp: we need to differentiate between node states:
    In the cluster
    Ready to accept work
    Trusted to accept work
  • Proposal:
    New nodes initiate contact with the master
    Allow multiple config options for how trust can be established - IP, cert, etc.
    Each new node only needs one piece of information - how to find the master
    Can support many different auth modes - let anyone in, whitelist IPs, a particular signed cert, queue up requests for an admin to approve, etc.
    Default should be auto-register with no auth/approval needed Auth-ing is separate from registering
    Supporting switching between permissive and strict auth modes:
    Each node should register a public key such that if the auth mode is changed to require a cert upon registration, old nodes won't break
    kelseyhightower: let the minion do the same thing that kube-register currently does
    Separate adding a node to the cluster from declaring it as schedulable
  • Use cases:
    Kick the tires, everything should be automagic
    Professional that needs security
  • Working group for later: Joe, Kelsey, Quintin, Eric Paris

Usability

  • Getting started
    Want easy entry for Docker users
    Library/registry of pod templates
  • GUI - visualization of relationships and dependencies, workflows, dashboards, ways to learn, first impressions
    Will be easiest to start with a read-only UI before worrying about read-write workflows
  • Docs
    Need to refactor getting started guides so that there's one common guide
    Each cloud provider will just have its own short guide on how to create a cluster
    Need a simple test that can verify whether your cluster is healthy or diagnose why it isn't
    Make it easier to get to architecture/design doc from front page of github project
    Table of contents for docs?
    Realistic examples
    Kelsey has found that doing a tutorial of deploying with a canary helped make the value of labels clear
  • CLI
    Annoying when local auth files and config get overwritten when trying to work with multiple clusters
    Like when running e2e tests
  • Common friction points
    External IPs
    Image registry
    Secrets
    Deployment
    Stateful services
    Scheduling
    Events/status
    Log access
  • Working groups
    GUI - Jordan, Brian, Max, Satnam
    CLI - Jeff, Sam, Derek
    Docs - Proppy, Kelsey, TJ, Satnam, Jeff
    Features/Experience - Dawn, Rohit, Kelsey, Proppy, Clayton: https://docs.google.com/document/d/1hqn6FtBNMe0sThbciq2PbE_P5BONBgCzHST4gz2zj50/edit

v1beta3 discussion

12-04-2014
Network -- breakout

  • Dynamic IP
    Once we support live migration, IP assigned for each POD has to move together, which might be broken the underneath.
    We dont have introspection, which makes supporting various network topology harder.
    External IP is an important part.
    Theres a kick-the-tires mode and full-on mode (for GCE, AWS - fully featured).
    How do we select kick-the-tires ? Weave, Flannel, Calico: pick one.
    Someone does a comparison. thockin@ would like help in evaluating these tech against some benchmarks. Eric Paris can help - has a bare-metal setup. Well have a benchmark setup for evaluation.
    We need to have two real use-cases at least - a webserver example; can 10 pods find each other. lavalamp@ working on a test.
    If docker picks up a plugin model, we can use that.
    Cluster will be dynamically change, we need to design a flexible network plugin API to accomplish this.
    Flannel two things: network allocation through etcd and traffic routing w/ overlays. Also programs underlay networks (like GCE). Flannel will do IP allocation, not hard-coded.
    One special use case: per node, there are only 20 ips could be allocated. Scheduler might need to know the limitation: OUT-OF-IP(?)
    Different cloud providers, but OVS is a common mechanism
    We might need Network Grids at the end
    ACTIONS: better doc, test.
  • Public Services
    Hard problem: Have to scale to GCE, GCE load balancer cannot target to arbitrary IP, only can target to a VM for now.
    Until we have an external IP, you cannot build a HA public service.
    We can run Digital Ocean on top of kubernetes
    Issue: When starting a public service, there is internal IP assigned. It is accessable from node within cluster, but not from outside. Now we have a 3-tier services, how to access one service from outside The issue is how to take this internal accessible service externalized. General solution: forwarding the traffic outside to the internal IP. First action, teach kubernetes mapping.
    We need a registry of those public IPs. All traffic comes to that IP will be forwarded to proper IP internally.
    public service can register with DNS, and do a intermiddle load balancing outside cluster / kubernetes. Label query to tell the endpoint.
    K8s proxy can be L3 LB, and listen to the external IPs, it also talk to k8s service DB and find internal services; then goes to L7 LB, which could be HAP proxy, scheduled as a pod, it talks to Pods DB, find a cluster of pods to forward the traffic.

Two types of services: mapping external IPs and L3 LB to map to pods. L7 LB can access the IPs assigned to pods.
Policy: Add more nodes, more external IPs can be used.
Issue1: how to take external IP to map to a list of pods, L3 LB part.
Issue2: how to slice those external IPs: general pool vs. private pools.

  • IP-per-service, visibility, segmenting
  • Scale
  • MAC

Roadmap

  • Should be driven by scenarios / use cases -- breakout
  • Storage / stateful services -- breakout
    Clustered databases / kv stores
    Mongo
    MySQL master/slave
    Cassandra
    etcd
    zookeeper
    redis
    ldap
    Alternatives
    local storage
    durable volumes
    identity associated with volumes
    lifecycle management
    network storage (ceph, nfs, gluster, hdfs)
    volume plugin
    flocker - volume migration
    “durable” data (as reliable as host)
  • Upgrading Kubernetes
    master components
    kubelets
    OS + kernel + Docker
  • Usability
    Easy cluster startup
    Minion registration
    Configuring k8s
    move away from flags in master
    node config distribution
    kubelet config
    dockercfg
    Cluster scaling
    CLI + config + deployment / rolling updates
    Selected workloads
  • Networking
    External IPs
    DNS
    Kick-the-tires networking implementation
  • Admission control not required for 1.0
  • v1 API + deprecation policy
  • Kubelet API well defined and versioned
  • Basic resource-aware scheduling -- breakout
    require limits?
    auto-sizing
  • Registry
    Predictable deployment (config-time image resolution)
    Easy code->k8s
    Simple out-of-the box setup
    One or many?
    Proxy?
    Service?
    Configurable .dockercfg
  • Productionization
    Scalability
    100 for 1.0
    1000 by summer 2015
    HA master -- not gating 1.0
    Master election
    Eliminate global in-memory state
    IP allocator
    Operations
    Sharding
    Pod getter
    Kubelets need to coast when master down
    Dont blow away pods when master is down
    Testing
    More/better/easier E2E
    E2E integration testing w/ OpenShift
    More non-E2E integration tests
    Long-term soaking / stress test
    Backward compatibility
    Release cadence and artifacts
    Export monitoring metrics (instrumentation)
    Bounded disk space on master and kubelets
    GC of unused images
  • Docs
    Reference architecture
  • Auth[nz]
    plugins + policy admin user->master master component->component: localhost in 1.0 kubelet->master