From f6e1200b496dbf637081c5b88f0c9d31049e2a2f Mon Sep 17 00:00:00 2001 From: Brian Grant Date: Wed, 30 Jan 2019 23:45:22 -0800 Subject: [PATCH] Added document explaining the scope of the Kubernetes project, including historical rationales and decision criteria. --- .../design-proposals/architecture/scope.md | 342 ++++++++++++++++++ 1 file changed, 342 insertions(+) create mode 100644 contributors/design-proposals/architecture/scope.md diff --git a/contributors/design-proposals/architecture/scope.md b/contributors/design-proposals/architecture/scope.md new file mode 100644 index 000000000..c81f6cc76 --- /dev/null +++ b/contributors/design-proposals/architecture/scope.md @@ -0,0 +1,342 @@ +# Kubernetes scope + +Purpose of this doc: Clarify factors affecting decisions regarding +what is and is not in scope for the Kubernetes project. + +Related documents: +* [What is Kubernetes?](https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/) +* [Kubernetes design and architecture](architecture.md) +* [Kubernetes architectural roadmap (2017)](architectural-roadmap.md) +* [Design principles](principles.md) +* [Kubernetes resource management](resource-management.md) + +Kubernetes is a portable, extensible open-source platform for managing +containerized workloads and services, that facilitates both +declarative configuration and automation. Workload portability is an +especially high priority. Kubernetes provides a flexible, easy-to-run, +secure foundation for running containerized applications on any cloud +provider or your own systems. + +While not a full distribution in the Linux sense, adoption of +Kubernetes has been facilitated by the fact that the upstream releases +are usable on their own, with minimal dependencies (e.g., etcd, a +container runtime, and a networking implementation). + +The high-level scope and goals are often insufficient for making +decisions about where to draw the line, so this documents where the +line is, the rationale for some past decisions, and some general +criteria that have been applied, including non-technical +considerations. For instance, user adoption and continued operation of +the project itself are also important factors. + +## Significant areas + +More details can be found below, but a concise list of areas in scope follows: +* Containerized workload execution and management +* Service discovery, load balancing, and routing +* Workload identity propagation and authentication +* Declarative resource management platform +* Command-line tool +* Web dashboard (UI) +* Cluster lifecycle tools +* Extensibility to support execution and management in diverse environments +* Multi-cluster management tools and systems +* Project GitHub automation and other process automation +* Project continuous build and test infrastructure +* Release tooling +* Documentation +* Usage data collection mechanisms + +## Scope domains + +Most decisions are regarding whether any part of the project should +undertake efforts in a particular area. However, some decisions may +sometimes be necessary for smaller scopes. The term "core" is sometimes +used, but is not well defined. The following are scopes that may be relevant: +* Kubernetes project github orgs + * All github orgs + * The kubernetes github org + * The kubernetes-sigs and kubernetes-incubator github orgs + * The kubernetes-client github org + * Other github orgs +* Release artifacts + * The Kubernetes release bundle + * Binaries built in kubernetes/kubernetes + * “core” server components: apiserver, controller manager, scheduler, kube-proxy, kubelet + * kubectl + * kubeadm + * Other images, packages, etc. +* The kubernetes/kubernetes repository (aka k/k) + * master branch + * kubernetes/kubernetes/master/pkg + * kubernetes/kubernetes/master/staging +* [Functionality layers](architectural-roadmap.md) + * required + * pluggable + * optional + * usable independently of the rest of Kubernetes + +## Other inclusion considerations + +The Kubernetes project is a large, complex effort. + +* Is the functionality consistent with the existing implementation + conventions, design principles, architecture, and direction? + +* Do the subproject owners, approvers, reviewers, and regular contributors + agree to maintain the functionality? + +* Do the contributors to the functionality agree to follow the + project’s development conventions and requirements, including CLA, + code of conduct, github and build tooling, testing, documentation, + and release criteria, etc.? + +* Does the functionality improve existing use cases, or mostly enable + new ones? The project isn't completely blocking new functionality + (more reducing the rate of expansion), but it is trying to + limit additions to kubernetes/kubernetes/master, and aims to improve the + quality of the functionality that already exists. + +* Is it needed by project contributors? Example: We need cluster + creation and upgrade functionality in order to run end-to-end tests. + +* Is it necessary in order to enable workload portability? + +* Is it needed in order for upstream releases to be usable? For + example, things without which users otherwise were + reverse-engineering Kubernetes to figure out, and/or copying code + out of Kubernetes itself to make work. + +* Is it functionality that users expect, such as because other + container platforms and/or service discovery and routing mechanisms + provide it? If a capability that relates to Kubernetes's fundamental + purpose were to become table stakes in the industry, Kubernetes + would need to support it in order to stay relevant. (Whether it + would need to be addressed by the core project would depend on the + other criteria.) + +* Is there sufficiently broad user demand and/or sufficient expected + user benefit for the functionality? + +* Is there an adequate mechanism to discover, deploy, express a + dependency on, and upgrade the functionality if implemented using an + extension mechanism? Are there consistent notions of releases, maturity, + quality, version skew, conformance, etc. for extensions? + +* Is it needed as a reference implementation exercising extension + points or other APIs? + +* Is the functionality sufficiently general-purpose? + +* Is it an area where we want to provide an opinionated solution + and/or where fragmentation would be problematic for users, or are + there many reasonable alternative approaches and solutions to the + problem? + +* Is it an area where we want to foster exploration and innovation in + the ecosystem? + +* Has the ecosystem produced adequate solutions on its own? For + instance, have ecosystem projects taken on requirements of the + Kubernetes project, if needed? Example: etcd3 added a number of features + and other improvements to benefit Kubernetes, so the project didn't + need to launch a separate storage effort. + +* Is there an acceptable home for the recommended ecosystem solution(s)? + Example: the [CNCF Sandbox](https://github.com/cncf/toc/blob/master/process/sandbox.md) is one possible home + +* Has the functionality been provided by the project/release/component + historically? + +## Technical scope details and rationale + +### Containerized workload execution and management + +Including: +* common general categories of workloads, such as stateless, stateful, batch, and cluster services +* provisioning, allocation, accessing, and managing compute, storage, and network resources on behalf of the workloads, and enforcement of security policies on those resources +* workload prioritization, capacity assessment, placement, and relocation (aka scheduling) +* graceful workload eviction +* local container image caching +* configuration and secret distribution +* manual and automatic horizontal and vertical scaling +* deployment, progressive (aka rolling) upgrades, and downgrades +* self-healing +* exposing container logs, status, health, and resource usage metrics for collection + +### Service discovery, load balancing, and routing + +Including: +* endpoint tracking and discovery, including pod and non-pod endpoints +* the most common L4 and L7 Internet protocols (TCP, UDP, SCTP, HTTP, HTTPS) +* intra-cluster DNS configuration and serving +* external DNS configuration +* accessing external services (e.g., imported services, Open Service Broker) +* exposing traffic latency, throughput, and status metrics for collection +* access authorization + +### Workload identity propagation and authentication + +Including: +* internal identity (e.g., SPIFFE support) +* external identity (e.g., TLS certificate management) + +### Declarative resource management platform + +Including: +* CRUD API operations and behaviors, diff, patch, dry run, watch +* declarative updates (apply) +* resource type definition, registration, discovery, documentation, and validation mechanisms +* pluggable authentication, authorization, admission (API-level policy enforcement), and audit-logging mechanisms +* Namespace (resource scoping primitive) lifecycle +* resource instance persistence and garbage collection +* asynchronous event reporting +* API producer SDK +* API client SDK / libraries in widely used languages +* dynamic, resource-oriented CLI, as a reference implementation for interacting with the API and basic tool for declarative and imperative management + * simplifies getting started and avoids complexities of documenting the system with just, for instance, curl + +### Command-line tool + +Since some Kubernetes primitives are fairly low-level, in addition to +general-purpose resource-oriented operations, the CLI also supports +“porcelain” for common simple, domain-specific operational operations (both +status/progress extraction and mutations) that don’t have discrete API +implementations, such as run, expose, rollout, cp, top, cordon, and +drain. And there should be support for non-resource-oriented APIs, +such as exec, logs, attach, port-forward, and proxy. + +### Web dashboard (UI) + +The project supported a dashboard, initially built into the apiserver, +almost from the beginning. Other projects in the space had UIs and +users expected one. There wasn’t a vendor-neutral one in the +ecosystem, however, and a solution was needed for the project's local +cluster environment, minikube. The dashboard has also served as a UI +reference implementation and a vehicle to drive conventions (e.g., +around resource category terminology). The dashboard has also been +useful as a tool to demonstrate and to learn about Kubernetes +concepts, features, and behaviors. + +### Cluster lifecycle tools + +Cluster lifecycle includes provisioning, bootstrapping, +upgrade/downgrade, and teardown. The project develops several such tools. +Tools are needed for the following scenarios/purposes: +* usability of upstream releases: at least one solution that can be used to bootstrap the upstream release (e.g., kubeadm) +* testing: solutions that can be used to run multi-node end-to-end tests (e.g., kind), integration tests, upgrade/downgrade tests, version-skew tests, scalability tests, and other types of tests the projects deems necessary to ensure adequate release quality +* portable, low-dependency local environment: at least one local environment (e.g., minikube), in order to simplify documentation tutorials that require a cluster to exist + +### Extensibility to support execution and management in diverse environments + +Including: +* CRI +* CNI +* CSI +* external cloud providers +* KMS providers +* OSB brokers +* Cluster APIs + +### Multi-cluster management tools and systems + +Many users desire to operate in and deploy applications to multiple +geographic locations and environments, even across multiple providers. +This generally requires managing multiple Kubernetes clusters. While +general deployment pipeline tools and continuous deployment systems +are not in scope, the project has explored multiple mechanisms to +simplify management of resources across multiple clusters, including +Federation v1, Federation v2, and the Cluster Registry API. + +### Project GitHub automation and other process automation + +As one of the largest, most active projects on Github, Kubernetes has +some extreme needs. + +Including: +* prow +* gubernator +* velodrome and kettle +* website infrastructure +* k8s.io + +### Project continuous build and test infrastructure + +Including: +* prow +* tide +* triage dashboard + +### Release tooling + +Including: +* anago + +### Documentation + +Documentation of project-provided functionality and components, for +multiple audiences, including: +* application developers +* application operators +* cluster operators +* ecosystem developers +* distribution providers, and others who want to port Kubernetes to new environments +* project contributors + +### Usage data collection mechanisms + +Including: +* Spartakus + +## Examples of projects and areas not in scope + +Some of these are obvious, but many have been seriously deliberated in the +past. +* The resource instance store (etcd) +* Container runtimes, other than current grandfathered ones +* Network and storage plugins, other than current grandfathered ones +* CoreDNS + * Since intra-cluster DNS is in scope, we need to ensure we have + some solution, which has been kubedns, but now that there is an + adequate alternative outside the project, we are adopting it. +* Service load balancers (e.g., Envoy, Linkerd), other than kube-proxy +* Cloud provider implementations, other than current grandfathered ones +* Container image build tools +* Image registries and distribution mechanisms +* Identity (user/group) sources of truth (e.g., LDAP) +* Key management systems (e.g., Vault) +* CI, CD, and GitOps (push to deploy) systems, other than + infrastructure used to build and test the Kubernetes project itself +* Application-level services, such as middleware (e.g., message + buses), data-processing frameworks (e.g., Spark), machine-learning + frameworks (e.g., Kubeflow), databases (e.g., Mysql), caches, nor + cluster storage systems (e.g., Ceph) as built-in services. Such + components can run on Kubernetes, and/or can be accessed by + applications running on Kubernetes through portable mechanisms, such + as the Open Service Broker. Application-specific Operators (e.g., + Cassandra Operator) are also not in scope. +* Application and cluster log aggregation and searching, application + and cluster monitoring aggregation and dashboarding (other than + heapster, which is grandfathered), alerting, application performance + management, tracing, and debugging tools +* General-purpose machine configuration (e.g., Chef, Puppet, Ansible, + Salt), maintenance, automation (e.g., Rundeck), and management systems +* Templating and configuration languages (e.g., jinja, jsonnet, + starlark, hcl, dhall, hocon) +* File packaging tools (e.g., helm, kpm, kubepack, duffle) +* Managing non-containerized applications in VMs, and other general + IaaS functionality +* Full Platform as a Service functionality +* Full Functions as a Service functionality +* [Workflow + orchestration](https://github.com/kubernetes/kubernetes/pull/24781#issuecomment-215914822): + "Workflow" is a very broad, diverse area, with solutions typically + tailored to specific use cases (e.g., data-flow graphs, data-driven + processing, deployment pipelines, event-driven automation, + business-process execution, iPaaS) and specific input and event + sources, and often requires arbitrary code to evaluate conditions, + actions, and/or failure handling. +* Other forms of human-oriented and programmatic interfaces over the + Kubernetes API other than “basic” CLIs (e.g., kubectl) and UI + (dashboard), such as mobile dashboards, IDEs, chat bots, SQL, + interactive shells, etc.