Merge pull request #27466 from chrisnegus/prod-env-text

Adding new Production Environment section
This commit is contained in:
Kubernetes Prow Robot 2021-05-04 12:27:14 -07:00 committed by GitHub
commit dd9fa36afd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 290 additions and 1 deletions

View File

@ -1,4 +1,293 @@
---
title: Production environment
title: "Production environment"
description: Create a production-quality Kubernetes cluster
weight: 30
no_list: true
---
<!-- overview -->
A production-quality Kubernetes cluster requires planning and preparation.
If your Kubernetes cluster is to run critical workloads, it must be configured to be resilient.
This page explains steps you can take to set up a production-ready cluster,
or to uprate an existing cluster for production use.
If you're already familiar with production setup and want the links, skip to
[What's next](#what-s-next).
<!-- body -->
## Production considerations
Typically, a production Kubernetes cluster environment has more requirements than a
personal learning, development, or test environment Kubernetes. A production environment may require
secure access by many users, consistent availability, and the resources to adapt
to changing demands.
As you decide where you want your production Kubernetes environment to live
(on premises or in a cloud) and the amount of management you want to take
on or hand to others, consider how your requirements for a Kubernetes cluster
are influenced by the following issues:
- *Availability*: A single-machine Kubernetes [learning environment](/docs/setup/#learning-environment)
has a single point of failure. Creating a highly available cluster means considering:
- Separating the control plane from the worker nodes.
- Replicating the control plane components on multiple nodes.
- Load balancing traffic to the clusters {{< glossary_tooltip term_id="kube-apiserver" text="API server" >}}.
- Having enough worker nodes available, or able to quickly become available, as changing workloads warrant it.
- *Scale*: If you expect your production Kubernetes environment to receive a stable amount of
demand, you might be able to set up for the capacity you need and be done. However,
if you expect demand to grow over time or change dramatically based on things like
season or special events, you need to plan how to scale to relieve increased
pressure from more requests to the control plane and worker nodes or scale down to reduce unused
resources.
- *Security and access management*: You have full admin privileges on your own
Kubernetes learning cluster. But shared clusters with important workloads, and
more than one or two users, require a more refined approach to who and what can
access cluster resources. You can use role-based access control
([RBAC](/docs/reference/access-authn-authz/rbac/)) and other
security mechanisms to make sure that users and workloads can get access to the
resources they need, while keeping workloads, and the cluster itself, secure.
You can set limits on the resources that users and workloads can access
by managing [policies](https://kubernetes.io/docs/concepts/policy/) and
[container resources](/docs/concepts/configuration/manage-resources-containers/).
Before building a Kubernetes production environment on your own, consider
handing off some or all of this job to
[Turnkey Cloud Solutions](/docs/setup/production-environment/turnkey-solutions/)
providers or other [Kubernetes Partners](https://kubernetes.io/partners/).
Options include:
- *Serverless*: Just run workloads on third-party equipment without managing
a cluster at all. You will be charged for things like CPU usage, memory, and
disk requests.
- *Managed control plane*: Let the provider manage the scale and availability
of the cluster's control plane, as well as handle patches and upgrades.
- *Managed worker nodes*: Configure pools of nodes to meet your needs,
then the provider makes sure those nodes are available and ready to implement
upgrades when needed.
- *Integration*: There are providers that integrate Kubernetes with other
services you may need, such as storage, container registries, authentication
methods, and development tools.
Whether you build a production Kubernetes cluster yourself or work with
partners, review the following sections to evaluate your needs as they relate
to your clusters *control plane*, *worker nodes*, *user access*, and
*workload resources*.
## Production cluster setup
In a production-quality Kubernetes cluster, the control plane manages the
cluster from services that can be spread across multiple computers
in different ways. Each worker node, however, represents a single entity that
is configured to run Kubernetes pods.
### Production control plane
The simplest Kubernetes cluster has the entire control plane and worker node
services running on the same machine. You can grow that environment by adding
worker nodes, as reflected in the diagram illustrated in
[Kubernetes Components](/docs/concepts/overview/components/).
If the cluster is meant to be available for a short period of time, or can be
discarded if something goes seriously wrong, this might meet your needs.
If you need a more permanent, highly available cluster, however, you should
consider ways of extending the control plane. By design, one-machine control
plane services running on a single machine are not highly available.
If keeping the cluster up and running
and ensuring that it can be repaired if something goes wrong is important,
consider these steps:
- *Choose deployment tools*: You can deploy a control plane using tools such
as kubeadm, kops, and kubespray. See
[Installing Kubernetes with deployment tools](/docs/setup/production-environment/tools/)
to learn tips for production-quality deployments using each of those deployment
methods. Different [Container Runtimes](/docs/setup/production-environment/container-runtimes/)
are available to use with your deployments.
- *Manage certificates*: Secure communications between control plane services
are implemented using certificates. Certificates are automatically generated
during deployment or you can generate them using your own certificate authority.
See [PKI certificates and requirements](/docs/setup/best-practices/certificates/) for details.
- *Configure load balancer for apiserver*: Configure a load balancer
to distribute external API requests to the apiserver service instances running on different nodes. See
[Create an External Load Balancer](/docs/tasks/access-application-cluster/create-external-load-balancer/)
for details.
- *Separate and backup etcd service*: The etcd services can either run on the
same machines as other control plane services or run on separate machines, for
extra security and availability. Because etcd stores cluster configuration data,
backing up the etcd database should be done regularly to ensure that you can
repair that database if needed.
See the [etcd FAQ](https://etcd.io/docs/v3.4/faq/) for details on configuring and using etcd.
See [Operating etcd clusters for Kubernetes](/docs/tasks/administer-cluster/configure-upgrade-etcd/)
and [Set up a High Availability etcd cluster with kubeadm](/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/)
for details.
- *Create multiple control plane systems*: For high availability, the
control plane should not be limited to a single machine. If the control plane
services are run by an init service (such as systemd), each service should run on at
least three machines. However, running control plane services as pods in
Kubernetes ensures that the replicated number of services that you request
will always be available.
The scheduler should be fault tolerant,
but not highly available. Some deployment tools set up [Raft](https://raft.github.io/)
consensus algorithm to do leader election of Kubernetes services. If the
primary goes away, another service elects itself and take over.
- *Span multiple zones*: If keeping your cluster available at all times is
critical, consider creating a cluster that runs across multiple data centers,
referred to as zones in cloud environments. Groups of zones are referred to as regions.
By spreading a cluster across
multiple zones in the same region, it can improve the chances that your
cluster will continue to function even if one zone becomes unavailable.
See [Running in multiple zones](/docs/setup/best-practices/multiple-zones/) for details.
- *Manage on-going features*: If you plan to keep your cluster over time,
there are tasks you need to do to maintain its health and security. For example,
if you installed with kubeadm, there are instructions to help you with
[Certificate Management](/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/)
and [Upgrading kubeadm clusters](/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/).
See [Administer a Cluster](/docs/tasks/administer-cluster/)
for a longer list of Kubernetes administrative tasks.
To learn about available options when you run control plane services, see
[kube-apiserver](/docs/reference/command-line-tools-reference/kube-apiserver/),
[kube-controller-manager](/docs/reference/command-line-tools-reference/kube-controller-manager/),
and [kube-scheduler](/docs/reference/command-line-tools-reference/kube-scheduler/)
component pages. For highly available control plane examples, see
[Options for Highly Available topology](/docs/setup/production-environment/tools/kubeadm/ha-topology/),
[Creating Highly Available clusters with kubeadm](/docs/setup/production-environment/tools/kubeadm/high-availability/),
and [Operating etcd clusters for Kubernetes](/docs/tasks/administer-cluster/configure-upgrade-etcd/).
See [Backing up an etcd cluster](/docs/tasks/administer-cluster/configure-upgrade-etcd/#backing-up-an-etcd-cluster)
for information on making an etcd backup plan.
### Production worker nodes
Production-quality workloads need to be resilient and anything they rely
on needs to be resilient (such as CoreDNS). Whether you manage your own
control plane or have a cloud provider do it for you, you still need to
consider how you want to manage your worker nodes (also referred to
simply as *nodes*).
- *Configure nodes*: Nodes can be physical or virtual machines. If you want to
create and manage your own nodes, you can install a supported operating system,
then add and run the appropriate
[Node services](/docs/concepts/overview/components/#node-components). Consider:
- The demands of your workloads when you set up nodes by having appropriate memory, CPU, and disk speed and storage capacity available.
- Whether generic computer systems will do or you have workloads that need GPU processors, Windows nodes, or VM isolation.
- *Validate nodes*: See [Valid node setup](/docs/setup/best-practices/node-conformance/)
for information on how to ensure that a node meets the requirements to join
a Kubernetes cluster.
- *Add nodes to the cluster*: If you are managing your own cluster you can
add nodes by setting up your own machines and either adding them manually or
having them register themselves to the clusters apiserver. See the
[Nodes](/docs/concepts/architecture/nodes/) section for information on how to set up Kubernetes to add nodes in these ways.
- *Add Windows nodes to the cluster*: Kubernetes offers support for Windows
worker nodes, allowing you to run workloads implemented in Windows containers. See
[Windows in Kubernetes](/docs/setup/production-environment/windows/) for details.
- *Scale nodes*: Have a plan for expanding the capacity your cluster will
eventually need. See [Considerations for large clusters](/docs/setup/best-practices/cluster-large/)
to help determine how many nodes you need, based on the number of pods and
containers you need to run. If you are managing nodes yourself, this can mean
purchasing and installing your own physical equipment.
- *Autoscale nodes*: Most cloud providers support
[Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#readme)
to replace unhealthy nodes or grow and shrink the number of nodes as demand requires. See the
[Frequently Asked Questions](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md)
for how the autoscaler works and
[Deployment](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#deployment)
for how it is implemented by different cloud providers. For on-premises, there
are some virtualization platforms that can be scripted to spin up new nodes
based on demand.
- *Set up node health checks*: For important workloads, you want to make sure
that the nodes and pods running on those nodes are healthy. Using the
[Node Problem Detector](/docs/tasks/debug-application-cluster/monitor-node-health/)
daemon, you can ensure your nodes are healthy.
## Production user management
In production, you may be moving from a model where you or a small group of
people are accessing the cluster to where there may potentially be dozens or
hundreds of people. In a learning environment or platform prototype, you might have a single
administrative account for everything you do. In production, you will want
more accounts with different levels of access to different namespaces.
Taking on a production-quality cluster means deciding how you
want to selectively allow access by other users. In particular, you need to
select strategies for validating the identities of those who try to access your
cluster (authentication) and deciding if they have permissions to do what they
are asking (authorization):
- *Authentication*: The apiserver can authenticate users using client
certificates, bearer tokens, an authenticating proxy, or HTTP basic auth.
You can choose which authentication methods you want to use.
Using plugins, the apiserver can leverage your organizations existing
authentication methods, such as LDAP or Kerberos. See
[Authentication](/docs/reference/access-authn-authz/authentication/)
for a description of these different methods of authenticating Kubernetes users.
- *Authorization*: When you set out to authorize your regular users, you will probably choose between RBAC and ABAC authorization. See [Authorization Overview](/docs/reference/access-authn-authz/authorization/) to review different modes for authorizing user accounts (as well as service account access to your cluster):
- *Role-based access control* ([RBAC](/docs/reference/access-authn-authz/rbac/)): Lets you assign access to your cluster by allowing specific sets of permissions to authenticated users. Permissions can be assigned for a specific namespace (Role) or across the entire cluster (ClusterRole). Then using RoleBindings and ClusterRoleBindings, those permissions can be attached to particular users.
- *Attribute-based access control* ([ABAC](/docs/reference/access-authn-authz/abac/)): Lets you create policies based on resource attributes in the cluster and will allow or deny access based on those attributes. Each line of a policy file identifies versioning properties (apiVersion and kind) and a map of spec properties to match the subject (user or group), resource property, non-resource property (/version or /apis), and readonly. See [Examples](/docs/reference/access-authn-authz/abac/#examples) for details.
As someone setting up authentication and authorization on your production Kubernetes cluster, here are some things to consider:
- *Set the authorization mode*: When the Kubernetes API server
([kube-apiserver](/docs/reference/command-line-tools-reference/kube-apiserver/))
starts, the supported authentication modes must be set using the *--authorization-mode*
flag. For example, that flag in the *kube-adminserver.yaml* file (in */etc/kubernetes/manifests*)
could be set to Node,RBAC. This would allow Node and RBAC authorization for authenticated requests.
- *Create user certificates and role bindings (RBAC)*: If you are using RBAC
authorization, users can create a CertificateSigningRequest (CSR) that can be
signed by the cluster CA. Then you can bind Roles and ClusterRoles to each user.
See [Certificate Signing Requests](/docs/reference/access-authn-authz/certificate-signing-requests/)
for details.
- *Create policies that combine attributes (ABAC)*: If you are using ABAC
authorization, you can assign combinations of attributes to form policies to
authorize selected users or groups to access particular resources (such as a
pod), namespace, or apiGroup. For more information, see
[Examples](/docs/reference/access-authn-authz/abac/#examples).
- *Consider Admission Controllers*: Additional forms of authorization for
requests that can come in through the API server include
[Webhook Token Authentication](/docs/reference/access-authn-authz/authentication/#webhook-token-authentication).
Webhooks and other special authorization types need to be enabled by adding
[Admission Controllers](/docs/reference/access-authn-authz/admission-controllers/)
to the API server.
## Set limits on workload resources
Demands from production workloads can cause pressure both inside and outside
of the Kubernetes control plane. Consider these items when setting up for the
needs of your cluster's workloads:
- *Set namespace limits*: Set per-namespace quotas on things like memory and CPU. See
[Manage Memory, CPU, and API Resources](/docs/tasks/administer-cluster/manage-resources/)
for details. You can also set
[Hierarchical Namespaces](/blog/2020/08/14/introducing-hierarchical-namespaces/)
for inheriting limits.
- *Prepare for DNS demand*: If you expect workloads to massively scale up,
your DNS service must be ready to scale up as well. See
[Autoscale the DNS service in a Cluster](/docs/tasks/administer-cluster/dns-horizontal-autoscaling/).
- *Create additional service accounts*: User accounts determine what users can
do on a cluster, while a service account defines pod access within a particular
namespace. By default, a pod takes on the default service account from its namespace.
See [Managing Service Accounts](/docs/reference/access-authn-authz/service-accounts-admin/)
for information on creating a new service account. For example, you might want to:
- Add secrets that a pod could use to pull images from a particular container registry. See [Configure Service Accounts for Pods](/docs/tasks/configure-pod-container/configure-service-account/) for an example.
- Assign RBAC permissions to a service account. See [ServiceAccount permissions](/docs/reference/access-authn-authz/rbac/#service-account-permissions) for details.
## What's next {#what-s-next}
- Decide if you want to build your own production Kubernetes or obtain one from
available [Turnkey Cloud Solutions](/docs/setup/production-environment/turnkey-solutions/)
or [Kubernetes Partners](https://kubernetes.io/partners/).
- If you choose to build your own cluster, plan how you want to
handle [certificates](/docs/setup/best-practices/certificates/)
and set up high availability for features such as
[etcd](/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/)
and the
[API server](/docs/setup/production-environment/tools/kubeadm/ha-topology/).
- Choose from [kubeadm](/docs/setup/production-environment/tools/kubeadm/), [kops](/docs/setup/production-environment/tools/kops/) or [Kubespray](/docs/setup/production-environment/tools/kubespray/)
deployment methods.
- Configure user management by determining your
[Authentication](/docs/reference/access-authn-authz/authentication/) and
[Authorization](docs/reference/access-authn-authz/authorization/) methods.
- Prepare for application workloads by setting up
[resource limits](docs/tasks/administer-cluster/manage-resources/),
[DNS autoscaling](/docs/tasks/administer-cluster/dns-horizontal-autoscaling/)
and [service accounts](/docs/reference/access-authn-authz/service-accounts-admin/).