Merge pull request #27466 from chrisnegus/prod-env-text

Adding new Production Environment section
2021-05-04 12:27:14 -07:00 · 2021-05-04 12:27:14 -07:00 · dd9fa36afd
parent b02027a57c 2117217f62
commit dd9fa36afd
1 changed files with 290 additions and 1 deletions
--- a/content/en/docs/setup/production-environment/_index.md
+++ b/content/en/docs/setup/production-environment/_index.md
@ -1,4 +1,293 @@
 ---
-title: Production environment
+title: "Production environment"
 description: Create a production-quality Kubernetes cluster
 weight: 30
 no_list: true
 ---
 <!-- overview -->
 A production-quality Kubernetes cluster requires planning and preparation.
 If your Kubernetes cluster is to run critical workloads, it must be configured to be resilient.
 This page explains steps you can take to set up a production-ready cluster,
 or to uprate an existing cluster for production use.
 If you're already familiar with production setup and want the links, skip to
 [What's next](#what-s-next).
 <!-- body -->
 ## Production considerations
 Typically, a production Kubernetes cluster environment has more requirements than a
 personal learning, development, or test environment Kubernetes. A production environment may require
 secure access by many users, consistent availability, and the resources to adapt
 to changing demands.
 As you decide where you want your production Kubernetes environment to live
 (on premises or in a cloud) and the amount of management you want to take
 on or hand to others, consider how your requirements for a Kubernetes cluster
 are influenced by the following issues:
 - *Availability*: A single-machine Kubernetes [learning environment](/docs/setup/#learning-environment)
 has a single point of failure. Creating a highly available cluster means considering:
  - Separating the control plane from the worker nodes.
  - Replicating the control plane components on multiple nodes.
  - Load balancing traffic to the cluster’s {{< glossary_tooltip term_id="kube-apiserver" text="API server" >}}.
  - Having enough worker nodes available, or able to quickly become available, as changing workloads warrant it.
 - *Scale*: If you expect your production Kubernetes environment to receive a stable amount of
 demand, you might be able to set up for the capacity you need and be done. However,
 if you expect demand to grow over time or change dramatically based on things like
 season or special events, you need to plan how to scale to relieve increased
 pressure from more requests to the control plane and worker nodes or scale down to reduce unused
 resources.
 - *Security and access management*: You have full admin privileges on your own
 Kubernetes learning cluster. But shared clusters with important workloads, and
 more than one or two users, require a more refined approach to who and what can
 access cluster resources. You can use role-based access control
 ([RBAC](/docs/reference/access-authn-authz/rbac/)) and other
 security mechanisms to make sure that users and workloads can get access to the
 resources they need, while keeping workloads, and the cluster itself, secure.
 You can set limits on the resources that users and workloads can access
 by managing [policies](https://kubernetes.io/docs/concepts/policy/) and
 [container resources](/docs/concepts/configuration/manage-resources-containers/).
 Before building a Kubernetes production environment on your own, consider
 handing off some or all of this job to 
 [Turnkey Cloud Solutions](/docs/setup/production-environment/turnkey-solutions/) 
 providers or other [Kubernetes Partners](https://kubernetes.io/partners/).
 Options include:
 - *Serverless*: Just run workloads on third-party equipment without managing
 a cluster at all. You will be charged for things like CPU usage, memory, and
 disk requests.
 - *Managed control plane*: Let the provider manage the scale and availability
 of the cluster's control plane, as well as handle patches and upgrades.
 - *Managed worker nodes*: Configure pools of nodes to meet your needs,
 then the provider makes sure those nodes are available and ready to implement
 upgrades when needed.
 - *Integration*: There are providers that integrate Kubernetes with other
 services you may need, such as storage, container registries, authentication
 methods, and development tools.
 Whether you build a production Kubernetes cluster yourself or work with
 partners, review the following sections to evaluate your needs as they relate
 to your cluster’s *control plane*, *worker nodes*, *user access*, and
 *workload resources*.
 ## Production cluster setup
 In a production-quality Kubernetes cluster, the control plane manages the
 cluster from services that can be spread across multiple computers
 in different ways. Each worker node, however, represents a single entity that
 is configured to run Kubernetes pods.
 ### Production control plane
 The simplest Kubernetes cluster has the entire control plane and worker node
 services running on the same machine. You can grow that environment by adding
 worker nodes, as reflected in the diagram illustrated in
 [Kubernetes Components](/docs/concepts/overview/components/).
 If the cluster is meant to be available for a short period of time, or can be
 discarded if something goes seriously wrong, this might meet your needs.
 If you need a more permanent, highly available cluster, however, you should
 consider ways of extending the control plane. By design, one-machine control
 plane services running on a single machine are not highly available.
 If keeping the cluster up and running
 and ensuring that it can be repaired if something goes wrong is important,
 consider these steps:
 - *Choose deployment tools*: You can deploy a control plane using tools such
 as kubeadm, kops, and kubespray. See
 [Installing Kubernetes with deployment tools](/docs/setup/production-environment/tools/)
 to learn tips for production-quality deployments using each of those deployment
 methods. Different [Container Runtimes](/docs/setup/production-environment/container-runtimes/)
 are available to use with your deployments.
 - *Manage certificates*: Secure communications between control plane services
 are implemented using certificates. Certificates are automatically generated
 during deployment or you can generate them using your own certificate authority.
 See [PKI certificates and requirements](/docs/setup/best-practices/certificates/) for details.
 - *Configure load balancer for apiserver*: Configure a load balancer
 to distribute external API requests to the apiserver service instances running on different nodes. See 
 [Create an External Load Balancer](/docs/tasks/access-application-cluster/create-external-load-balancer/)
 for details.
 - *Separate and backup etcd service*: The etcd services can either run on the
 same machines as other control plane services or run on separate machines, for
 extra security and availability. Because etcd stores cluster configuration data,
 backing up the etcd database should be done regularly to ensure that you can
 repair that database if needed.
 See the [etcd FAQ](https://etcd.io/docs/v3.4/faq/) for details on configuring and using etcd.
 See [Operating etcd clusters for Kubernetes](/docs/tasks/administer-cluster/configure-upgrade-etcd/)
 and [Set up a High Availability etcd cluster with kubeadm](/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/)
 for details.
 - *Create multiple control plane systems*: For high availability, the
 control plane should not be limited to a single machine. If the control plane
 services are run by an init service (such as systemd), each service should run on at
 least three machines. However, running control plane services as pods in
 Kubernetes ensures that the replicated number of services that you request
 will always be available.
 The scheduler should be fault tolerant,
 but not highly available. Some deployment tools set up [Raft](https://raft.github.io/)
 consensus algorithm to do leader election of Kubernetes services. If the
 primary goes away, another service elects itself and take over. 
 - *Span multiple zones*: If keeping your cluster available at all times is
 critical, consider creating a cluster that runs across multiple data centers,
 referred to as zones in cloud environments. Groups of zones are referred to as regions.
 By spreading a cluster across
 multiple zones in the same region, it can improve the chances that your
 cluster will continue to function even if one zone becomes unavailable.
 See [Running in multiple zones](/docs/setup/best-practices/multiple-zones/) for details.
 - *Manage on-going features*: If you plan to keep your cluster over time,
 there are tasks you need to do to maintain its health and security. For example,
 if you installed with kubeadm, there are instructions to help you with
 [Certificate Management](/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/)
 and [Upgrading kubeadm clusters](/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/).
 See [Administer a Cluster](/docs/tasks/administer-cluster/)
 for a longer list of Kubernetes administrative tasks.
 To learn about available options when you run control plane services, see
 [kube-apiserver](/docs/reference/command-line-tools-reference/kube-apiserver/),
 [kube-controller-manager](/docs/reference/command-line-tools-reference/kube-controller-manager/),
 and [kube-scheduler](/docs/reference/command-line-tools-reference/kube-scheduler/)
 component pages. For highly available control plane examples, see
 [Options for Highly Available topology](/docs/setup/production-environment/tools/kubeadm/ha-topology/),
 [Creating Highly Available clusters with kubeadm](/docs/setup/production-environment/tools/kubeadm/high-availability/),
 and [Operating etcd clusters for Kubernetes](/docs/tasks/administer-cluster/configure-upgrade-etcd/).
 See [Backing up an etcd cluster](/docs/tasks/administer-cluster/configure-upgrade-etcd/#backing-up-an-etcd-cluster)
 for information on making an etcd backup plan.
 ### Production worker nodes
 Production-quality workloads need to be resilient and anything they rely
 on needs to be resilient (such as CoreDNS). Whether you manage your own
 control plane or have a cloud provider do it for you, you still need to
 consider how you want to manage your worker nodes (also referred to
 simply as *nodes*).  
 - *Configure nodes*: Nodes can be physical or virtual machines. If you want to
 create and manage your own nodes, you can install a supported operating system,
 then add and run the appropriate
 [Node services](/docs/concepts/overview/components/#node-components). Consider:
  - The demands of your workloads when you set up nodes by having appropriate memory, CPU, and disk speed and storage capacity available.
  - Whether generic computer systems will do or you have workloads that need GPU processors, Windows nodes, or VM isolation.
 - *Validate nodes*: See [Valid node setup](/docs/setup/best-practices/node-conformance/)
 for information on how to ensure that a node meets the requirements to join
 a Kubernetes cluster.
 - *Add nodes to the cluster*: If you are managing your own cluster you can
 add nodes by setting up your own machines and either adding them manually or
 having them register themselves to the cluster’s apiserver. See the
 [Nodes](/docs/concepts/architecture/nodes/) section for information on how to set up Kubernetes to add nodes in these ways.
 - *Add Windows nodes to the cluster*: Kubernetes offers support for Windows
 worker nodes, allowing you to run workloads implemented in Windows containers. See
 [Windows in Kubernetes](/docs/setup/production-environment/windows/) for details.
 - *Scale nodes*: Have a plan for expanding the capacity your cluster will
 eventually need. See [Considerations for large clusters](/docs/setup/best-practices/cluster-large/)
 to help determine how many nodes you need, based on the number of pods and
 containers you need to run. If you are managing nodes yourself, this can mean
 purchasing and installing your own physical equipment.
 - *Autoscale nodes*: Most cloud providers support
 [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#readme)
 to replace unhealthy nodes or grow and shrink the number of nodes as demand requires. See the
 [Frequently Asked Questions](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md)
 for how the autoscaler works and
 [Deployment](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#deployment)
 for how it is implemented by different cloud providers. For on-premises, there
 are some virtualization platforms that can be scripted to spin up new nodes
 based on demand.
 - *Set up node health checks*: For important workloads, you want to make sure
 that the nodes and pods running on those nodes are healthy. Using the
 [Node Problem Detector](/docs/tasks/debug-application-cluster/monitor-node-health/)
 daemon, you can ensure your nodes are healthy.
 ## Production user management
 In production, you may be moving from a model where you or a small group of
 people are accessing the cluster to where there may potentially be dozens or
 hundreds of people. In a learning environment or platform prototype, you might have a single
 administrative account for everything you do. In production, you will want
 more accounts with different levels of access to different namespaces.
 Taking on a production-quality cluster means deciding how you
 want to selectively allow access by other users. In particular, you need to
 select strategies for validating the identities of those who try to access your
 cluster (authentication) and deciding if they have permissions to do what they
 are asking (authorization):
 - *Authentication*: The apiserver can authenticate users using client
 certificates, bearer tokens, an authenticating proxy, or HTTP basic auth.
 You can choose which authentication methods you want to use.
 Using plugins, the apiserver can leverage your organization’s existing
 authentication methods, such as LDAP or Kerberos. See
 [Authentication](/docs/reference/access-authn-authz/authentication/)
 for a description of these different methods of authenticating Kubernetes users.
 - *Authorization*: When you set out to authorize your regular users, you will probably choose between RBAC and ABAC authorization. See [Authorization Overview](/docs/reference/access-authn-authz/authorization/) to review different modes for authorizing user accounts (as well as service account access to your cluster):
  - *Role-based access control* ([RBAC](/docs/reference/access-authn-authz/rbac/)): Lets you assign access to your cluster by allowing specific sets of permissions to authenticated users. Permissions can be assigned for a specific namespace (Role) or across the entire cluster (ClusterRole). Then using RoleBindings and ClusterRoleBindings, those permissions can be attached to particular users.
  - *Attribute-based access control* ([ABAC](/docs/reference/access-authn-authz/abac/)): Lets you create policies based on resource attributes in the cluster and will allow or deny access based on those attributes. Each line of a policy file identifies versioning properties (apiVersion and kind) and a map of spec properties to match the subject (user or group), resource property, non-resource property (/version or /apis), and readonly. See [Examples](/docs/reference/access-authn-authz/abac/#examples) for details.
 As someone setting up authentication and authorization on your production Kubernetes cluster, here are some things to consider:
 - *Set the authorization mode*: When the Kubernetes API server
 ([kube-apiserver](/docs/reference/command-line-tools-reference/kube-apiserver/))
 starts, the supported authentication modes must be set using the *--authorization-mode*
 flag. For example, that flag in the *kube-adminserver.yaml* file (in */etc/kubernetes/manifests*)
 could be set to Node,RBAC. This would allow Node and RBAC authorization for authenticated requests.
 - *Create user certificates and role bindings (RBAC)*: If you are using RBAC
 authorization, users can create a CertificateSigningRequest (CSR) that can be
 signed by the cluster CA. Then you can bind Roles and ClusterRoles to each user.
 See [Certificate Signing Requests](/docs/reference/access-authn-authz/certificate-signing-requests/)
 for details.
 - *Create policies that combine attributes (ABAC)*: If you are using ABAC
 authorization, you can assign combinations of attributes to form policies to
 authorize selected users or groups to access particular resources (such as a
 pod), namespace, or apiGroup. For more information, see
 [Examples](/docs/reference/access-authn-authz/abac/#examples).
 - *Consider Admission Controllers*: Additional forms of authorization for
 requests that can come in through the API server include
 [Webhook Token Authentication](/docs/reference/access-authn-authz/authentication/#webhook-token-authentication).
 Webhooks and other special authorization types need to be enabled by adding
 [Admission Controllers](/docs/reference/access-authn-authz/admission-controllers/)
 to the API server.
 ## Set limits on workload resources
 Demands from production workloads can cause pressure both inside and outside
 of the Kubernetes control plane. Consider these items when setting up for the
 needs of your cluster's workloads:
 - *Set namespace limits*: Set per-namespace quotas on things like memory and CPU. See
 [Manage Memory, CPU, and API Resources](/docs/tasks/administer-cluster/manage-resources/)
 for details. You can also set
 [Hierarchical Namespaces](/blog/2020/08/14/introducing-hierarchical-namespaces/)
 for inheriting limits.
 - *Prepare for DNS demand*: If you expect workloads to massively scale up,
 your DNS service must be ready to scale up as well. See
 [Autoscale the DNS service in a Cluster](/docs/tasks/administer-cluster/dns-horizontal-autoscaling/).
 - *Create additional service accounts*: User accounts determine what users can
 do on a cluster, while a service account defines pod access within a particular
 namespace. By default, a pod takes on the default service account from its namespace.
 See [Managing Service Accounts](/docs/reference/access-authn-authz/service-accounts-admin/)
 for information on creating a new service account. For example, you might want to:
  - Add secrets that a pod could use to pull images from a particular container registry. See [Configure Service Accounts for Pods](/docs/tasks/configure-pod-container/configure-service-account/) for an example.
  - Assign RBAC permissions to a service account. See [ServiceAccount permissions](/docs/reference/access-authn-authz/rbac/#service-account-permissions) for details.
 ## What's next {#what-s-next}
 - Decide if you want to build your own production Kubernetes or obtain one from
 available [Turnkey Cloud Solutions](/docs/setup/production-environment/turnkey-solutions/)
 or [Kubernetes Partners](https://kubernetes.io/partners/).
 - If you choose to build your own cluster, plan how you want to
 handle [certificates](/docs/setup/best-practices/certificates/)
 and set up high availability for features such as
 [etcd](/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/)
 and the
 [API server](/docs/setup/production-environment/tools/kubeadm/ha-topology/).
 - Choose from [kubeadm](/docs/setup/production-environment/tools/kubeadm/), [kops](/docs/setup/production-environment/tools/kops/) or [Kubespray](/docs/setup/production-environment/tools/kubespray/)
 deployment methods.
 - Configure user management by determining your
 [Authentication](/docs/reference/access-authn-authz/authentication/) and
 [Authorization](docs/reference/access-authn-authz/authorization/) methods.
 - Prepare for application workloads by setting up
 [resource limits](docs/tasks/administer-cluster/manage-resources/),
 [DNS autoscaling](/docs/tasks/administer-cluster/dns-horizontal-autoscaling/)
 and [service accounts](/docs/reference/access-authn-authz/service-accounts-admin/).