Merge pull request #6933 from bgrant0607/docfix

Update docs. Add design principles.
This commit is contained in:
Daniel Smith 2015-04-16 16:55:10 -07:00
commit f61c721244
3 changed files with 116 additions and 0 deletions

17
README.md Normal file
View File

@ -0,0 +1,17 @@
# Kubernetes Design Overview
Kubernetes is a system for managing containerized applications across multiple hosts, providing basic mechanisms for deployment, maintenance, and scaling of applications.
Kubernetes establishes robust declarative primitives for maintaining the desired state requested by the user. We see these primitives as the main value added by Kubernetes. Self-healing mechanisms, such as auto-restarting, re-scheduling, and replicating containers require active controllers, not just imperative orchestration.
Kubernetes is primarily targeted at applications composed of multiple containers, such as elastic, distributed micro-services. It is also designed to facilitate migration of non-containerized application stacks to Kubernetes. It therefore includes abstractions for grouping containers in both loosely coupled and tightly coupled formations, and provides ways for containers to find and communicate with each other in relatively familiar ways.
Kubernetes enables users to ask a cluster to run a set of containers. The system automatically chooses hosts to run those containers on. While Kubernetes's scheduler is currently very simple, we expect it to grow in sophistication over time. Scheduling is a policy-rich, topology-aware, workload-specific function that significantly impacts availability, performance, and capacity. The scheduler needs to take into account individual and collective resource requirements, quality of service requirements, hardware/software/policy constraints, affinity and anti-affinity specifications, data locality, inter-workload interference, deadlines, and so on. Workload-specific requirements will be exposed through the API as necessary.
Kubernetes is intended to run on a number of cloud providers, as well as on physical hosts.
A single Kubernetes cluster is not intended to span multiple availability zones. Instead, we recommend building a higher-level layer to replicate complete deployments of highly available applications across multiple zones (see [the availability doc](../availability.md) and [cluster federation proposal](../proposals/federation.md) for more details).
Finally, Kubernetes aspires to be an extensible, pluggable, building-block OSS platform and toolkit. Therefore, architecturally, we want Kubernetes to be built as a collection of pluggable components and layers, with the ability to use alternative schedulers, controllers, storage systems, and distribution mechanisms, and we're evolving its current code in that direction. Furthermore, we want others to be able to extend Kubernetes functionality, such as with higher-level PaaS functionality or multi-cluster layers, without modification of core Kubernetes source. Therefore, its API isn't just (or even necessarily mainly) targeted at end users, but at tool and extension developers. Its APIs are intended to serve as the foundation for an open ecosystem of tools, automation systems, and higher-level API layers. Consequently, there are no "internal" inter-component APIs. All APIs are visible and available, including the APIs used by the scheduler, the node controller, the replication-controller manager, Kubelet's API, etc. There's no glass to break -- in order to handle more complex use cases, one can just access the lower-level APIs in a fully transparent, composable manner.
For more about the Kubernetes architecture, see [architecture](architecture.md).

44
architecture.md Normal file
View File

@ -0,0 +1,44 @@
# Kubernetes architecture
A running Kubernetes cluster contains node agents (kubelet) and master components (APIs, scheduler, etc), on top of a distributed storage solution. This diagram shows our desired eventual state, though we're still working on a few things, like making kubelet itself (all our components, really) run within containers, and making the scheduler 100% pluggable.
![Architecture Diagram](../architecture.png?raw=true "Architecture overview")
## The Kubernetes Node
When looking at the architecture of the system, we'll break it down to services that run on the worker node and services that compose the cluster-level control plane.
The Kubernetes node has the services necessary to run application containers and be managed from the master systems.
Each node runs Docker, of course. Docker takes care of the details of downloading images and running containers.
### Kubelet
The **Kubelet** manages [pods](../pods.md) and their containers, their images, their volumes, etc.
### Kube-Proxy
Each node also runs a simple network proxy and load balancer (see the [services FAQ](https://github.com/GoogleCloudPlatform/kubernetes/wiki/Services-FAQ) for more details). This reflects `services` (see [the services doc](../docs/services.md) for more details) as defined in the Kubernetes API on each node and can do simple TCP and UDP stream forwarding (round robin) across a set of backends.
Service endpoints are currently found via [DNS](../dns.md) or through environment variables (both [Docker-links-compatible](https://docs.docker.com/userguide/dockerlinks/) and Kubernetes {FOO}_SERVICE_HOST and {FOO}_SERVICE_PORT variables are supported). These variables resolve to ports managed by the service proxy.
## The Kubernetes Control Plane
The Kubernetes control plane is split into a set of components. Currently they all run on a single _master_ node, but that is expected to change soon in order to support high-availability clusters. These components work together to provide a unified view of the cluster.
### etcd
All persistent master state is stored in an instance of `etcd`. This provides a great way to store configuration data reliably. With `watch` support, coordinating components can be notified very quickly of changes.
### Kubernetes API Server
The apiserver serves up the [Kubernetes API](../api.md). It is intended to be a CRUD-y server, with most/all business logic implemented in separate components or in plug-ins. It mainly processes REST operations, validates them, and updates the corresponding objects in `etcd` (and eventually other stores).
### Scheduler
The scheduler binds unscheduled pods to nodes via the `/binding` API. The scheduler is pluggable, and we expect to support multiple cluster schedulers and even user-provided schedulers in the future.
### Kubernetes Controller Manager Server
All other cluster-level functions are currently performed by the Controller Manager. For instance, `Endpoints` objects are created and updated by the endpoints controller, and nodes are discovered, managed, and monitored by the node controller. These could eventually be split into separate components to make them independently pluggable.
The [`replicationController`](../replication-controller.md) is a mechanism that is layered on top of the simple [`pod`](../pods.md) API. We eventually plan to port it to a generic plug-in mechanism, once one is implemented.

55
principles.md Normal file
View File

@ -0,0 +1,55 @@
# Design Principles
Principles to follow when extending Kubernetes.
## API
See also the [API conventions](../api-conventions.md).
* All APIs should be declarative.
* API objects should be complementary and composable, not opaque wrappers.
* The control plane should be transparent -- there are no hidden internal APIs.
* The cost of API operations should be proportional to the number of objects intentionally operated upon. Therefore, common filtered lookups must be indexed. Beware of patterns of multiple API calls that would incur quadratic behavior.
* Object status must be 100% reconstructable by observation. Any history kept must be just an optimization and not required for correct operation.
* Cluster-wide invariants are difficult to enforce correctly. Try not to add them. If you must have them, don't enforce them atomically in master components, that is contention-prone and doesn't provide a recovery path in the case of a bug allowing the invariant to be violated. Instead, provide a series of checks to reduce the probability of a violation, and make every component involved able to recover from an invariant violation.
* Low-level APIs should be designed for control by higher-level systems. Higher-level APIs should be intent-oriented (think SLOs) rather than implementation-oriented (think control knobs).
## Control logic
* Functionality must be *level-based*, meaning the system must operate correctly given the desired state and the current/observed state, regardless of how many intermediate state updates may have been missed. Edge-triggered behavior must be just an optimization.
* Assume an open world: continually verify assumptions and gracefully adapt to external events and/or actors. Example: we allow users to kill pods under control of a replication controller; it just replaces them.
* Do not define comprehensive state machines for objects with behaviors associated with state transitions and/or "assumed" states that cannot be ascertained by observation.
* Don't assume a component's decisions will not be overridden or rejected, nor for the component to always understand why. For example, etcd may reject writes. Kubelet may reject pods. The scheduler may not be able to schedule pods. Retry, but back off and/or make alternative decisions.
* Components should be self-healing. For example, if you must keep some state (e.g., cache) the content needs to be periodically refreshed, so that if an item does get erroneously stored or a deletion event is missed etc, it will be soon fixed, ideally on timescales that are shorter than what will attract attention from humans.
* Component behavior should degrade gracefully. Prioritize actions so that the most important activities can continue to function even when overloaded and/or in states of partial failure.
## Architecture
* Only the apiserver should communicate with etcd/store, and not other components (scheduler, kubelet, etc.).
* Compromising a single node shouldn't compromise the cluster.
* Components should continue to do what they were last told in the absence of new instructions (e.g., due to network partition or component outage).
* All components should keep all relevant state in memory all the time. The apiserver should write through to etcd/store, other components should write through to the apiserver, and they should watch for updates made by other clients.
* Watch is preferred over polling.
## Extensibility
TODO: pluggability
## Bootstrapping
* [Self-hosting](https://github.com/GoogleCloudPlatform/kubernetes/issues/246) of all components is a goal.
* Minimize the number of dependencies, particularly those required for steady-state operation.
* Stratify the dependencies that remain via principled layering.
* Break any circular dependencies by converting hard dependencies to soft dependencies.
* Also accept that data from other components from another source, such as local files, which can then be manually populated at bootstrap time and then continuously updated once those other components are available.
* State should be rediscoverable and/or reconstructable.
* Make it easy to run temporary, bootstrap instances of all components in order to create the runtime state needed to run the components in the steady state; use a lock (master election for distributed components, file lock for local components like Kubelet) to coordinate handoff. We call this technique "pivoting".
* Have a solution to restart dead components. For distributed components, replication works well. For local components such as Kubelet, a process manager or even a simple shell loop works.
## Availability
TODO
## General principles
* [Eric Raymond's 17 UNIX rules](https://en.wikipedia.org/wiki/Unix_philosophy#Eric_Raymond.E2.80.99s_17_Unix_Rules)