From fabd20afce30e947425346fa2938ad0edfa8b867 Mon Sep 17 00:00:00 2001 From: Tim Hockin Date: Fri, 17 Jul 2015 15:35:41 -0700 Subject: [PATCH] Run gendocs --- README.md | 1 + access.md | 15 ++++++++++++--- admission_control.md | 1 + admission_control_limit_range.md | 2 ++ admission_control_resource_quota.md | 2 ++ architecture.md | 2 ++ clustering.md | 2 ++ clustering/README.md | 1 + command_execution_port_forwarding.md | 3 +++ event_compression.md | 6 ++++++ expansion.md | 1 + identifiers.md | 1 + namespaces.md | 1 + networking.md | 1 + persistent-storage.md | 1 + principles.md | 1 + resources.md | 10 ++++++++++ security.md | 1 + security_context.md | 6 ++++++ service_accounts.md | 6 +++++- simple-rolling-update.md | 9 +++++++++ versioning.md | 1 + 22 files changed, 70 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index b0f3115ae..62946cb6f 100644 --- a/README.md +++ b/README.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Kubernetes Design Overview Kubernetes is a system for managing containerized applications across multiple hosts, providing basic mechanisms for deployment, maintenance, and scaling of applications. diff --git a/access.md b/access.md index e42d78597..9a0c0d3dc 100644 --- a/access.md +++ b/access.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # K8s Identity and Access Management Sketch This document suggests a direction for identity and access management in the Kubernetes system. @@ -43,6 +44,7 @@ High level goals are: - Ease integration with existing enterprise and hosted scenarios. ### Actors + Each of these can act as normal users or attackers. - External Users: People who are accessing applications running on K8s (e.g. a web site served by webserver running in a container on K8s), but who do not have K8s API access. - K8s Users : People who access the K8s API (e.g. create K8s API objects like Pods) @@ -51,6 +53,7 @@ Each of these can act as normal users or attackers. - K8s Admin means K8s Cluster Admins and K8s Project Admins taken together. ### Threats + Both intentional attacks and accidental use of privilege are concerns. For both cases it may be useful to think about these categories differently: @@ -81,6 +84,7 @@ K8s Cluster assets: This document is primarily about protecting K8s User assets and K8s cluster assets from other K8s Users and K8s Project and Cluster Admins. ### Usage environments + Cluster in Small organization: - K8s Admins may be the same people as K8s Users. - few K8s Admins. @@ -112,6 +116,7 @@ Pods configs should be largely portable between Org-run and hosted configuration # Design + Related discussion: - https://github.com/GoogleCloudPlatform/kubernetes/issues/442 - https://github.com/GoogleCloudPlatform/kubernetes/issues/443 @@ -125,7 +130,9 @@ K8s distribution should include templates of config, and documentation, for simp Features in this doc are divided into "Initial Feature", and "Improvements". Initial features would be candidates for version 1.00. ## Identity -###userAccount + +### userAccount + K8s will have a `userAccount` API object. - `userAccount` has a UID which is immutable. This is used to associate users with objects and to record actions in audit logs. - `userAccount` has a name which is a string and human readable and unique among userAccounts. It is used to refer to users in Policies, to ensure that the Policies are human readable. It can be changed only when there are no Policy objects or other objects which refer to that name. An email address is a suggested format for this field. @@ -158,7 +165,8 @@ Enterprise Profile: - each service using the API has own `userAccount` too. (e.g. `scheduler`, `repcontroller`) - automated jobs to denormalize the ldap group info into the local system list of users into the K8s userAccount file. -###Unix accounts +### Unix accounts + A `userAccount` is not a Unix user account. The fact that a pod is started by a `userAccount` does not mean that the processes in that pod's containers run as a Unix user with a corresponding name or identity. Initially: @@ -170,7 +178,8 @@ Improvements: - requires docker to integrate user namespace support, and deciding what getpwnam() does for these uids. - any features that help users avoid use of privileged containers (https://github.com/GoogleCloudPlatform/kubernetes/issues/391) -###Namespaces +### Namespaces + K8s will have a have a `namespace` API object. It is similar to a Google Compute Engine `project`. It provides a namespace for objects created by a group of people co-operating together, preventing name collisions with non-cooperating groups. It also serves as a reference point for authorization policies. Namespaces are described in [namespaces.md](namespaces.md). diff --git a/admission_control.md b/admission_control.md index aaa6ed164..c75d55359 100644 --- a/admission_control.md +++ b/admission_control.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Kubernetes Proposal - Admission Control **Related PR:** diff --git a/admission_control_limit_range.md b/admission_control_limit_range.md index 90329815a..ccdb44d88 100644 --- a/admission_control_limit_range.md +++ b/admission_control_limit_range.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Admission control plugin: LimitRanger ## Background @@ -164,6 +165,7 @@ It is expected we will want to define limits for particular pods or containers b To make a **LimitRangeItem** more restrictive, we will intend to add these additional restrictions at a future point in time. ## Example + See the [example of Limit Range](../user-guide/limitrange/) for more information. diff --git a/admission_control_resource_quota.md b/admission_control_resource_quota.md index d5cdc9a15..99d5431a1 100644 --- a/admission_control_resource_quota.md +++ b/admission_control_resource_quota.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Admission control plugin: ResourceQuota ## Background @@ -185,6 +186,7 @@ services 3 5 ``` ## More information + See [resource quota document](../admin/resource-quota.md) and the [example of Resource Quota](../user-guide/resourcequota/) for more information. diff --git a/architecture.md b/architecture.md index 2e4afc622..f7c551719 100644 --- a/architecture.md +++ b/architecture.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Kubernetes architecture A running Kubernetes cluster contains node agents (kubelet) and master components (APIs, scheduler, etc), on top of a distributed storage solution. This diagram shows our desired eventual state, though we're still working on a few things, like making kubelet itself (all our components, really) run within containers, and making the scheduler 100% pluggable. @@ -45,6 +46,7 @@ The Kubernetes node has the services necessary to run application containers and Each node runs Docker, of course. Docker takes care of the details of downloading images and running containers. ### Kubelet + The **Kubelet** manages [pods](../user-guide/pods.md) and their containers, their images, their volumes, etc. ### Kube-Proxy diff --git a/clustering.md b/clustering.md index 8673284f4..1fcb8aa32 100644 --- a/clustering.md +++ b/clustering.md @@ -30,10 +30,12 @@ Documentation for other releases can be found at + # Clustering in Kubernetes ## Overview + The term "clustering" refers to the process of having all members of the kubernetes cluster find and trust each other. There are multiple different ways to achieve clustering with different security and usability profiles. This document attempts to lay out the user experiences for clustering that Kubernetes aims to address. Once a cluster is established, the following is true: diff --git a/clustering/README.md b/clustering/README.md index f05168d66..53649a31b 100644 --- a/clustering/README.md +++ b/clustering/README.md @@ -41,6 +41,7 @@ pip install seqdiag Just call `make` to regenerate the diagrams. ## Building with Docker + If you are on a Mac or your pip install is messed up, you can easily build with docker. ``` diff --git a/command_execution_port_forwarding.md b/command_execution_port_forwarding.md index c7408b58b..1d319adf7 100644 --- a/command_execution_port_forwarding.md +++ b/command_execution_port_forwarding.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Container Command Execution & Port Forwarding in Kubernetes ## Abstract @@ -87,12 +88,14 @@ won't be able to work with this mechanism, unless adapters can be written. ## Process Flow ### Remote Command Execution Flow + 1. The client connects to the Kubernetes Master to initiate a remote command execution request 2. The Master proxies the request to the Kubelet where the container lives 3. The Kubelet executes nsenter + the requested command and streams stdin/stdout/stderr back and forth between the client and the container ### Port Forwarding Flow + 1. The client connects to the Kubernetes Master to initiate a remote command execution request 2. The Master proxies the request to the Kubelet where the container lives diff --git a/event_compression.md b/event_compression.md index af823972b..29e659170 100644 --- a/event_compression.md +++ b/event_compression.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Kubernetes Event Compression This document captures the design of event compression. @@ -40,11 +41,13 @@ This document captures the design of event compression. Kubernetes components can get into a state where they generate tons of events which are identical except for the timestamp. For example, when pulling a non-existing image, Kubelet will repeatedly generate ```image_not_existing``` and ```container_is_waiting``` events until upstream components correct the image. When this happens, the spam from the repeated events makes the entire event mechanism useless. It also appears to cause memory pressure in etcd (see [#3853](https://github.com/GoogleCloudPlatform/kubernetes/issues/3853)). ## Proposal + Each binary that generates events (for example, ```kubelet```) should keep track of previously generated events so that it can collapse recurring events into a single event instead of creating a new instance for each new event. Event compression should be best effort (not guaranteed). Meaning, in the worst case, ```n``` identical (minus timestamp) events may still result in ```n``` event entries. ## Design + Instead of a single Timestamp, each event object [contains](../../pkg/api/types.go#L1111) the following fields: * ```FirstTimestamp util.Time``` * The date/time of the first occurrence of the event. @@ -78,11 +81,13 @@ Each binary that generates events: * An entry for the event is also added to the previously generated events cache. ## Issues/Risks + * Compression is not guaranteed, because each component keeps track of event history in memory * An application restart causes event history to be cleared, meaning event history is not preserved across application restarts and compression will not occur across component restarts. * Because an LRU cache is used to keep track of previously generated events, if too many unique events are generated, old events will be evicted from the cache, so events will only be compressed until they age out of the events cache, at which point any new instance of the event will cause a new entry to be created in etcd. ## Example + Sample kubectl output ``` @@ -104,6 +109,7 @@ Thu, 12 Feb 2015 01:13:20 +0000 Thu, 12 Feb 2015 01:13:20 +0000 1 This demonstrates what would have been 20 separate entries (indicating scheduling failure) collapsed/compressed down to 5 entries. ## Related Pull Requests/Issues + * Issue [#4073](https://github.com/GoogleCloudPlatform/kubernetes/issues/4073): Compress duplicate events * PR [#4157](https://github.com/GoogleCloudPlatform/kubernetes/issues/4157): Add "Update Event" to Kubernetes API * PR [#4206](https://github.com/GoogleCloudPlatform/kubernetes/issues/4206): Modify Event struct to allow compressing multiple recurring events in to a single event diff --git a/expansion.md b/expansion.md index 5cc08c6cd..096b8a9d8 100644 --- a/expansion.md +++ b/expansion.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Variable expansion in pod command, args, and env ## Abstract diff --git a/identifiers.md b/identifiers.md index eda7254be..9e2699936 100644 --- a/identifiers.md +++ b/identifiers.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Identifiers and Names in Kubernetes A summarization of the goals and recommendations for identifiers in Kubernetes. Described in [GitHub issue #199](https://github.com/GoogleCloudPlatform/kubernetes/issues/199). diff --git a/namespaces.md b/namespaces.md index 7bd7ab67b..1f1a767c6 100644 --- a/namespaces.md +++ b/namespaces.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Namespaces ## Abstract diff --git a/networking.md b/networking.md index ac6e57946..d7822d4d8 100644 --- a/networking.md +++ b/networking.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Networking There are 4 distinct networking problems to solve: diff --git a/persistent-storage.md b/persistent-storage.md index f919baa9e..3e9edd3ef 100644 --- a/persistent-storage.md +++ b/persistent-storage.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Persistent Storage This document proposes a model for managing persistent, cluster-scoped storage for applications requiring long lived data. diff --git a/principles.md b/principles.md index 1ae3bc3a1..c208fb6b4 100644 --- a/principles.md +++ b/principles.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Design Principles Principles to follow when extending Kubernetes. diff --git a/resources.md b/resources.md index 2effb5cf5..055c5d86e 100644 --- a/resources.md +++ b/resources.md @@ -48,6 +48,7 @@ The resource model aims to be: * precise, to avoid misunderstandings and promote pod portability. ## The resource model + A Kubernetes _resource_ is something that can be requested by, allocated to, or consumed by a pod or container. Examples include memory (RAM), CPU, disk-time, and network bandwidth. Once resources on a node have been allocated to one pod, they should not be allocated to another until that pod is removed or exits. This means that Kubernetes schedulers should ensure that the sum of the resources allocated (requested and granted) to its pods never exceeds the usable capacity of the node. Testing whether a pod will fit on a node is called _feasibility checking_. @@ -124,9 +125,11 @@ Where: ## Kubernetes-defined resource types + The following resource types are predefined ("reserved") by Kubernetes in the `kubernetes.io` namespace, and so cannot be used for user-defined resources. Note that the syntax of all resource types in the resource spec is deliberately similar, but some resource types (e.g., CPU) may receive significantly more support than simply tracking quantities in the schedulers and/or the Kubelet. ### Processor cycles + * Name: `cpu` (or `kubernetes.io/cpu`) * Units: Kubernetes Compute Unit seconds/second (i.e., CPU cores normalized to a canonical "Kubernetes CPU") * Internal representation: milli-KCUs @@ -141,6 +144,7 @@ Note that requesting 2 KCU won't guarantee that precisely 2 physical cores will ### Memory + * Name: `memory` (or `kubernetes.io/memory`) * Units: bytes * Compressible? no (at least initially) @@ -152,6 +156,7 @@ rather than decimal ones: "64MiB" rather than "64MB". ## Resource metadata + A resource type may have an associated read-only ResourceType structure, that contains metadata about the type. For example: ``` @@ -222,16 +227,19 @@ and predicted ## Future resource types ### _[future] Network bandwidth_ + * Name: "network-bandwidth" (or `kubernetes.io/network-bandwidth`) * Units: bytes per second * Compressible? yes ### _[future] Network operations_ + * Name: "network-iops" (or `kubernetes.io/network-iops`) * Units: operations (messages) per second * Compressible? yes ### _[future] Storage space_ + * Name: "storage-space" (or `kubernetes.io/storage-space`) * Units: bytes * Compressible? no @@ -239,6 +247,7 @@ and predicted The amount of secondary storage space available to a container. The main target is local disk drives and SSDs, although this could also be used to qualify remotely-mounted volumes. Specifying whether a resource is a raw disk, an SSD, a disk array, or a file system fronting any of these, is left for future work. ### _[future] Storage time_ + * Name: storage-time (or `kubernetes.io/storage-time`) * Units: seconds per second of disk time * Internal representation: milli-units @@ -247,6 +256,7 @@ The amount of secondary storage space available to a container. The main target This is the amount of time a container spends accessing disk, including actuator and transfer time. A standard disk drive provides 1.0 diskTime seconds per second. ### _[future] Storage operations_ + * Name: "storage-iops" (or `kubernetes.io/storage-iops`) * Units: operations per second * Compressible? yes diff --git a/security.md b/security.md index 2989148bb..522ff4ca5 100644 --- a/security.md +++ b/security.md @@ -30,6 +30,7 @@ Documentation for other releases can be found at + # Security in Kubernetes Kubernetes should define a reasonable set of security best practices that allows processes to be isolated from each other, from the cluster infrastructure, and which preserves important boundaries between those who manage the cluster, and those who use the cluster. diff --git a/security_context.md b/security_context.md index bc76495a3..03213927e 100644 --- a/security_context.md +++ b/security_context.md @@ -30,8 +30,11 @@ Documentation for other releases can be found at + # Security Contexts + ## Abstract + A security context is a set of constraints that are applied to a container in order to achieve the following goals (from [security design](security.md)): 1. Ensure a clear isolation between container and the underlying host it runs on @@ -53,11 +56,13 @@ to the container process. Support for user namespaces has recently been [merged](https://github.com/docker/libcontainer/pull/304) into Docker's libcontainer project and should soon surface in Docker itself. It will make it possible to assign a range of unprivileged uids and gids from the host to each container, improving the isolation between host and container and between containers. ### External integration with shared storage + In order to support external integration with shared storage, processes running in a Kubernetes cluster should be able to be uniquely identified by their Unix UID, such that a chain of ownership can be established. Processes in pods will need to have consistent UID/GID/SELinux category labels in order to access shared disks. ## Constraints and Assumptions + * It is out of the scope of this document to prescribe a specific set of constraints to isolate containers from their host. Different use cases need different settings. @@ -96,6 +101,7 @@ be addressed with security contexts: ## Proposed Design ### Overview + A *security context* consists of a set of constraints that determine how a container is secured before getting created and run. A security context resides on the container and represents the runtime parameters that will be used to create and run the container via container APIs. A *security context provider* is passed to the Kubelet so it can have a chance diff --git a/service_accounts.md b/service_accounts.md index c6acbd248..d9535de5a 100644 --- a/service_accounts.md +++ b/service_accounts.md @@ -30,7 +30,8 @@ Documentation for other releases can be found at -#Service Accounts + +# Service Accounts ## Motivation @@ -50,6 +51,7 @@ They also may interact with services other than the Kubernetes API, such as: - accessing files in an NFS volume attached to the pod ## Design Overview + A service account binds together several things: - a *name*, understood by users, and perhaps by peripheral systems, for an identity - a *principal* that can be authenticated and [authorized](../admin/authorization.md) @@ -137,6 +139,7 @@ are added to the map of tokens used by the authentication process in the apiserv might have some types that do not do anything on apiserver but just get pushed to the kubelet.) ### Pods + The `PodSpec` is extended to have a `Pods.Spec.ServiceAccountUsername` field. If this is unset, then a default value is chosen. If it is set, then the corresponding value of `Pods.Spec.SecurityContext` is set by the Service Account Finalizer (see below). @@ -144,6 +147,7 @@ Service Account Finalizer (see below). TBD: how policy limits which users can make pods with which service accounts. ### Authorization + Kubernetes API Authorization Policies refer to users. Pods created with a `Pods.Spec.ServiceAccountUsername` typically get a `Secret` which allows them to authenticate to the Kubernetes APIserver as a particular user. So any policy that is desired can be applied to them. diff --git a/simple-rolling-update.md b/simple-rolling-update.md index b142c6e51..80bc65666 100644 --- a/simple-rolling-update.md +++ b/simple-rolling-update.md @@ -30,12 +30,15 @@ Documentation for other releases can be found at + ## Simple rolling update + This is a lightweight design document for simple [rolling update](../user-guide/kubectl/kubectl_rolling-update.md) in ```kubectl```. Complete execution flow can be found [here](#execution-details). See the [example of rolling update](../user-guide/update-demo/) for more information. ### Lightweight rollout + Assume that we have a current replication controller named ```foo``` and it is running image ```image:v1``` ```kubectl rolling-update foo [foo-v2] --image=myimage:v2``` @@ -51,6 +54,7 @@ and the old 'foo' replication controller is deleted. For the purposes of the ro The value of that label is the hash of the complete JSON representation of the```foo-next``` or```foo``` replication controller. The name of this label can be overridden by the user with the ```--deployment-label-key``` flag. #### Recovery + If a rollout fails or is terminated in the middle, it is important that the user be able to resume the roll out. To facilitate recovery in the case of a crash of the updating process itself, we add the following annotations to each replication controller in the ```kubernetes.io/``` annotation namespace: * ```desired-replicas``` The desired number of replicas for this replication controller (either N or zero) @@ -68,6 +72,7 @@ it is assumed that the rollout is nearly completed, and ```foo-next``` is rename ### Aborting a rollout + Abort is assumed to want to reverse a rollout in progress. ```kubectl rolling-update foo [foo-v2] --rollback``` @@ -87,6 +92,7 @@ If the user doesn't specify a ```foo-next``` name, then it is either discovered then ```foo-next``` is synthesized using the pattern ```-``` #### Initialization + * If ```foo``` and ```foo-next``` do not exist: * Exit, and indicate an error to the user, that the specified controller doesn't exist. * If ```foo``` exists, but ```foo-next``` does not: @@ -102,6 +108,7 @@ then ```foo-next``` is synthesized using the pattern ```- 0 @@ -109,11 +116,13 @@ then ```foo-next``` is synthesized using the pattern ```- + # Kubernetes API and Release Versioning Legend: