Merge pull request #27466 from chrisnegus/prod-env-text
Adding new Production Environment section
This commit is contained in:
commit
dd9fa36afd
|
@ -1,4 +1,293 @@
|
|||
---
|
||||
title: Production environment
|
||||
title: "Production environment"
|
||||
description: Create a production-quality Kubernetes cluster
|
||||
weight: 30
|
||||
no_list: true
|
||||
---
|
||||
<!-- overview -->
|
||||
|
||||
A production-quality Kubernetes cluster requires planning and preparation.
|
||||
If your Kubernetes cluster is to run critical workloads, it must be configured to be resilient.
|
||||
This page explains steps you can take to set up a production-ready cluster,
|
||||
or to uprate an existing cluster for production use.
|
||||
If you're already familiar with production setup and want the links, skip to
|
||||
[What's next](#what-s-next).
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## Production considerations
|
||||
|
||||
Typically, a production Kubernetes cluster environment has more requirements than a
|
||||
personal learning, development, or test environment Kubernetes. A production environment may require
|
||||
secure access by many users, consistent availability, and the resources to adapt
|
||||
to changing demands.
|
||||
|
||||
As you decide where you want your production Kubernetes environment to live
|
||||
(on premises or in a cloud) and the amount of management you want to take
|
||||
on or hand to others, consider how your requirements for a Kubernetes cluster
|
||||
are influenced by the following issues:
|
||||
|
||||
- *Availability*: A single-machine Kubernetes [learning environment](/docs/setup/#learning-environment)
|
||||
has a single point of failure. Creating a highly available cluster means considering:
|
||||
- Separating the control plane from the worker nodes.
|
||||
- Replicating the control plane components on multiple nodes.
|
||||
- Load balancing traffic to the cluster’s {{< glossary_tooltip term_id="kube-apiserver" text="API server" >}}.
|
||||
- Having enough worker nodes available, or able to quickly become available, as changing workloads warrant it.
|
||||
|
||||
- *Scale*: If you expect your production Kubernetes environment to receive a stable amount of
|
||||
demand, you might be able to set up for the capacity you need and be done. However,
|
||||
if you expect demand to grow over time or change dramatically based on things like
|
||||
season or special events, you need to plan how to scale to relieve increased
|
||||
pressure from more requests to the control plane and worker nodes or scale down to reduce unused
|
||||
resources.
|
||||
|
||||
- *Security and access management*: You have full admin privileges on your own
|
||||
Kubernetes learning cluster. But shared clusters with important workloads, and
|
||||
more than one or two users, require a more refined approach to who and what can
|
||||
access cluster resources. You can use role-based access control
|
||||
([RBAC](/docs/reference/access-authn-authz/rbac/)) and other
|
||||
security mechanisms to make sure that users and workloads can get access to the
|
||||
resources they need, while keeping workloads, and the cluster itself, secure.
|
||||
You can set limits on the resources that users and workloads can access
|
||||
by managing [policies](https://kubernetes.io/docs/concepts/policy/) and
|
||||
[container resources](/docs/concepts/configuration/manage-resources-containers/).
|
||||
|
||||
Before building a Kubernetes production environment on your own, consider
|
||||
handing off some or all of this job to
|
||||
[Turnkey Cloud Solutions](/docs/setup/production-environment/turnkey-solutions/)
|
||||
providers or other [Kubernetes Partners](https://kubernetes.io/partners/).
|
||||
Options include:
|
||||
|
||||
- *Serverless*: Just run workloads on third-party equipment without managing
|
||||
a cluster at all. You will be charged for things like CPU usage, memory, and
|
||||
disk requests.
|
||||
- *Managed control plane*: Let the provider manage the scale and availability
|
||||
of the cluster's control plane, as well as handle patches and upgrades.
|
||||
- *Managed worker nodes*: Configure pools of nodes to meet your needs,
|
||||
then the provider makes sure those nodes are available and ready to implement
|
||||
upgrades when needed.
|
||||
- *Integration*: There are providers that integrate Kubernetes with other
|
||||
services you may need, such as storage, container registries, authentication
|
||||
methods, and development tools.
|
||||
|
||||
Whether you build a production Kubernetes cluster yourself or work with
|
||||
partners, review the following sections to evaluate your needs as they relate
|
||||
to your cluster’s *control plane*, *worker nodes*, *user access*, and
|
||||
*workload resources*.
|
||||
|
||||
## Production cluster setup
|
||||
|
||||
In a production-quality Kubernetes cluster, the control plane manages the
|
||||
cluster from services that can be spread across multiple computers
|
||||
in different ways. Each worker node, however, represents a single entity that
|
||||
is configured to run Kubernetes pods.
|
||||
|
||||
### Production control plane
|
||||
|
||||
The simplest Kubernetes cluster has the entire control plane and worker node
|
||||
services running on the same machine. You can grow that environment by adding
|
||||
worker nodes, as reflected in the diagram illustrated in
|
||||
[Kubernetes Components](/docs/concepts/overview/components/).
|
||||
If the cluster is meant to be available for a short period of time, or can be
|
||||
discarded if something goes seriously wrong, this might meet your needs.
|
||||
|
||||
If you need a more permanent, highly available cluster, however, you should
|
||||
consider ways of extending the control plane. By design, one-machine control
|
||||
plane services running on a single machine are not highly available.
|
||||
If keeping the cluster up and running
|
||||
and ensuring that it can be repaired if something goes wrong is important,
|
||||
consider these steps:
|
||||
|
||||
- *Choose deployment tools*: You can deploy a control plane using tools such
|
||||
as kubeadm, kops, and kubespray. See
|
||||
[Installing Kubernetes with deployment tools](/docs/setup/production-environment/tools/)
|
||||
to learn tips for production-quality deployments using each of those deployment
|
||||
methods. Different [Container Runtimes](/docs/setup/production-environment/container-runtimes/)
|
||||
are available to use with your deployments.
|
||||
- *Manage certificates*: Secure communications between control plane services
|
||||
are implemented using certificates. Certificates are automatically generated
|
||||
during deployment or you can generate them using your own certificate authority.
|
||||
See [PKI certificates and requirements](/docs/setup/best-practices/certificates/) for details.
|
||||
- *Configure load balancer for apiserver*: Configure a load balancer
|
||||
to distribute external API requests to the apiserver service instances running on different nodes. See
|
||||
[Create an External Load Balancer](/docs/tasks/access-application-cluster/create-external-load-balancer/)
|
||||
for details.
|
||||
- *Separate and backup etcd service*: The etcd services can either run on the
|
||||
same machines as other control plane services or run on separate machines, for
|
||||
extra security and availability. Because etcd stores cluster configuration data,
|
||||
backing up the etcd database should be done regularly to ensure that you can
|
||||
repair that database if needed.
|
||||
See the [etcd FAQ](https://etcd.io/docs/v3.4/faq/) for details on configuring and using etcd.
|
||||
See [Operating etcd clusters for Kubernetes](/docs/tasks/administer-cluster/configure-upgrade-etcd/)
|
||||
and [Set up a High Availability etcd cluster with kubeadm](/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/)
|
||||
for details.
|
||||
- *Create multiple control plane systems*: For high availability, the
|
||||
control plane should not be limited to a single machine. If the control plane
|
||||
services are run by an init service (such as systemd), each service should run on at
|
||||
least three machines. However, running control plane services as pods in
|
||||
Kubernetes ensures that the replicated number of services that you request
|
||||
will always be available.
|
||||
The scheduler should be fault tolerant,
|
||||
but not highly available. Some deployment tools set up [Raft](https://raft.github.io/)
|
||||
consensus algorithm to do leader election of Kubernetes services. If the
|
||||
primary goes away, another service elects itself and take over.
|
||||
- *Span multiple zones*: If keeping your cluster available at all times is
|
||||
critical, consider creating a cluster that runs across multiple data centers,
|
||||
referred to as zones in cloud environments. Groups of zones are referred to as regions.
|
||||
By spreading a cluster across
|
||||
multiple zones in the same region, it can improve the chances that your
|
||||
cluster will continue to function even if one zone becomes unavailable.
|
||||
See [Running in multiple zones](/docs/setup/best-practices/multiple-zones/) for details.
|
||||
- *Manage on-going features*: If you plan to keep your cluster over time,
|
||||
there are tasks you need to do to maintain its health and security. For example,
|
||||
if you installed with kubeadm, there are instructions to help you with
|
||||
[Certificate Management](/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/)
|
||||
and [Upgrading kubeadm clusters](/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/).
|
||||
See [Administer a Cluster](/docs/tasks/administer-cluster/)
|
||||
for a longer list of Kubernetes administrative tasks.
|
||||
|
||||
To learn about available options when you run control plane services, see
|
||||
[kube-apiserver](/docs/reference/command-line-tools-reference/kube-apiserver/),
|
||||
[kube-controller-manager](/docs/reference/command-line-tools-reference/kube-controller-manager/),
|
||||
and [kube-scheduler](/docs/reference/command-line-tools-reference/kube-scheduler/)
|
||||
component pages. For highly available control plane examples, see
|
||||
[Options for Highly Available topology](/docs/setup/production-environment/tools/kubeadm/ha-topology/),
|
||||
[Creating Highly Available clusters with kubeadm](/docs/setup/production-environment/tools/kubeadm/high-availability/),
|
||||
and [Operating etcd clusters for Kubernetes](/docs/tasks/administer-cluster/configure-upgrade-etcd/).
|
||||
See [Backing up an etcd cluster](/docs/tasks/administer-cluster/configure-upgrade-etcd/#backing-up-an-etcd-cluster)
|
||||
for information on making an etcd backup plan.
|
||||
|
||||
### Production worker nodes
|
||||
|
||||
Production-quality workloads need to be resilient and anything they rely
|
||||
on needs to be resilient (such as CoreDNS). Whether you manage your own
|
||||
control plane or have a cloud provider do it for you, you still need to
|
||||
consider how you want to manage your worker nodes (also referred to
|
||||
simply as *nodes*).
|
||||
|
||||
- *Configure nodes*: Nodes can be physical or virtual machines. If you want to
|
||||
create and manage your own nodes, you can install a supported operating system,
|
||||
then add and run the appropriate
|
||||
[Node services](/docs/concepts/overview/components/#node-components). Consider:
|
||||
- The demands of your workloads when you set up nodes by having appropriate memory, CPU, and disk speed and storage capacity available.
|
||||
- Whether generic computer systems will do or you have workloads that need GPU processors, Windows nodes, or VM isolation.
|
||||
- *Validate nodes*: See [Valid node setup](/docs/setup/best-practices/node-conformance/)
|
||||
for information on how to ensure that a node meets the requirements to join
|
||||
a Kubernetes cluster.
|
||||
- *Add nodes to the cluster*: If you are managing your own cluster you can
|
||||
add nodes by setting up your own machines and either adding them manually or
|
||||
having them register themselves to the cluster’s apiserver. See the
|
||||
[Nodes](/docs/concepts/architecture/nodes/) section for information on how to set up Kubernetes to add nodes in these ways.
|
||||
- *Add Windows nodes to the cluster*: Kubernetes offers support for Windows
|
||||
worker nodes, allowing you to run workloads implemented in Windows containers. See
|
||||
[Windows in Kubernetes](/docs/setup/production-environment/windows/) for details.
|
||||
- *Scale nodes*: Have a plan for expanding the capacity your cluster will
|
||||
eventually need. See [Considerations for large clusters](/docs/setup/best-practices/cluster-large/)
|
||||
to help determine how many nodes you need, based on the number of pods and
|
||||
containers you need to run. If you are managing nodes yourself, this can mean
|
||||
purchasing and installing your own physical equipment.
|
||||
- *Autoscale nodes*: Most cloud providers support
|
||||
[Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#readme)
|
||||
to replace unhealthy nodes or grow and shrink the number of nodes as demand requires. See the
|
||||
[Frequently Asked Questions](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md)
|
||||
for how the autoscaler works and
|
||||
[Deployment](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler#deployment)
|
||||
for how it is implemented by different cloud providers. For on-premises, there
|
||||
are some virtualization platforms that can be scripted to spin up new nodes
|
||||
based on demand.
|
||||
- *Set up node health checks*: For important workloads, you want to make sure
|
||||
that the nodes and pods running on those nodes are healthy. Using the
|
||||
[Node Problem Detector](/docs/tasks/debug-application-cluster/monitor-node-health/)
|
||||
daemon, you can ensure your nodes are healthy.
|
||||
|
||||
## Production user management
|
||||
|
||||
In production, you may be moving from a model where you or a small group of
|
||||
people are accessing the cluster to where there may potentially be dozens or
|
||||
hundreds of people. In a learning environment or platform prototype, you might have a single
|
||||
administrative account for everything you do. In production, you will want
|
||||
more accounts with different levels of access to different namespaces.
|
||||
|
||||
Taking on a production-quality cluster means deciding how you
|
||||
want to selectively allow access by other users. In particular, you need to
|
||||
select strategies for validating the identities of those who try to access your
|
||||
cluster (authentication) and deciding if they have permissions to do what they
|
||||
are asking (authorization):
|
||||
|
||||
- *Authentication*: The apiserver can authenticate users using client
|
||||
certificates, bearer tokens, an authenticating proxy, or HTTP basic auth.
|
||||
You can choose which authentication methods you want to use.
|
||||
Using plugins, the apiserver can leverage your organization’s existing
|
||||
authentication methods, such as LDAP or Kerberos. See
|
||||
[Authentication](/docs/reference/access-authn-authz/authentication/)
|
||||
for a description of these different methods of authenticating Kubernetes users.
|
||||
- *Authorization*: When you set out to authorize your regular users, you will probably choose between RBAC and ABAC authorization. See [Authorization Overview](/docs/reference/access-authn-authz/authorization/) to review different modes for authorizing user accounts (as well as service account access to your cluster):
|
||||
- *Role-based access control* ([RBAC](/docs/reference/access-authn-authz/rbac/)): Lets you assign access to your cluster by allowing specific sets of permissions to authenticated users. Permissions can be assigned for a specific namespace (Role) or across the entire cluster (ClusterRole). Then using RoleBindings and ClusterRoleBindings, those permissions can be attached to particular users.
|
||||
- *Attribute-based access control* ([ABAC](/docs/reference/access-authn-authz/abac/)): Lets you create policies based on resource attributes in the cluster and will allow or deny access based on those attributes. Each line of a policy file identifies versioning properties (apiVersion and kind) and a map of spec properties to match the subject (user or group), resource property, non-resource property (/version or /apis), and readonly. See [Examples](/docs/reference/access-authn-authz/abac/#examples) for details.
|
||||
|
||||
As someone setting up authentication and authorization on your production Kubernetes cluster, here are some things to consider:
|
||||
|
||||
- *Set the authorization mode*: When the Kubernetes API server
|
||||
([kube-apiserver](/docs/reference/command-line-tools-reference/kube-apiserver/))
|
||||
starts, the supported authentication modes must be set using the *--authorization-mode*
|
||||
flag. For example, that flag in the *kube-adminserver.yaml* file (in */etc/kubernetes/manifests*)
|
||||
could be set to Node,RBAC. This would allow Node and RBAC authorization for authenticated requests.
|
||||
- *Create user certificates and role bindings (RBAC)*: If you are using RBAC
|
||||
authorization, users can create a CertificateSigningRequest (CSR) that can be
|
||||
signed by the cluster CA. Then you can bind Roles and ClusterRoles to each user.
|
||||
See [Certificate Signing Requests](/docs/reference/access-authn-authz/certificate-signing-requests/)
|
||||
for details.
|
||||
- *Create policies that combine attributes (ABAC)*: If you are using ABAC
|
||||
authorization, you can assign combinations of attributes to form policies to
|
||||
authorize selected users or groups to access particular resources (such as a
|
||||
pod), namespace, or apiGroup. For more information, see
|
||||
[Examples](/docs/reference/access-authn-authz/abac/#examples).
|
||||
- *Consider Admission Controllers*: Additional forms of authorization for
|
||||
requests that can come in through the API server include
|
||||
[Webhook Token Authentication](/docs/reference/access-authn-authz/authentication/#webhook-token-authentication).
|
||||
Webhooks and other special authorization types need to be enabled by adding
|
||||
[Admission Controllers](/docs/reference/access-authn-authz/admission-controllers/)
|
||||
to the API server.
|
||||
|
||||
## Set limits on workload resources
|
||||
|
||||
Demands from production workloads can cause pressure both inside and outside
|
||||
of the Kubernetes control plane. Consider these items when setting up for the
|
||||
needs of your cluster's workloads:
|
||||
|
||||
- *Set namespace limits*: Set per-namespace quotas on things like memory and CPU. See
|
||||
[Manage Memory, CPU, and API Resources](/docs/tasks/administer-cluster/manage-resources/)
|
||||
for details. You can also set
|
||||
[Hierarchical Namespaces](/blog/2020/08/14/introducing-hierarchical-namespaces/)
|
||||
for inheriting limits.
|
||||
- *Prepare for DNS demand*: If you expect workloads to massively scale up,
|
||||
your DNS service must be ready to scale up as well. See
|
||||
[Autoscale the DNS service in a Cluster](/docs/tasks/administer-cluster/dns-horizontal-autoscaling/).
|
||||
- *Create additional service accounts*: User accounts determine what users can
|
||||
do on a cluster, while a service account defines pod access within a particular
|
||||
namespace. By default, a pod takes on the default service account from its namespace.
|
||||
See [Managing Service Accounts](/docs/reference/access-authn-authz/service-accounts-admin/)
|
||||
for information on creating a new service account. For example, you might want to:
|
||||
- Add secrets that a pod could use to pull images from a particular container registry. See [Configure Service Accounts for Pods](/docs/tasks/configure-pod-container/configure-service-account/) for an example.
|
||||
- Assign RBAC permissions to a service account. See [ServiceAccount permissions](/docs/reference/access-authn-authz/rbac/#service-account-permissions) for details.
|
||||
|
||||
## What's next {#what-s-next}
|
||||
|
||||
- Decide if you want to build your own production Kubernetes or obtain one from
|
||||
available [Turnkey Cloud Solutions](/docs/setup/production-environment/turnkey-solutions/)
|
||||
or [Kubernetes Partners](https://kubernetes.io/partners/).
|
||||
- If you choose to build your own cluster, plan how you want to
|
||||
handle [certificates](/docs/setup/best-practices/certificates/)
|
||||
and set up high availability for features such as
|
||||
[etcd](/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/)
|
||||
and the
|
||||
[API server](/docs/setup/production-environment/tools/kubeadm/ha-topology/).
|
||||
- Choose from [kubeadm](/docs/setup/production-environment/tools/kubeadm/), [kops](/docs/setup/production-environment/tools/kops/) or [Kubespray](/docs/setup/production-environment/tools/kubespray/)
|
||||
deployment methods.
|
||||
- Configure user management by determining your
|
||||
[Authentication](/docs/reference/access-authn-authz/authentication/) and
|
||||
[Authorization](docs/reference/access-authn-authz/authorization/) methods.
|
||||
- Prepare for application workloads by setting up
|
||||
[resource limits](docs/tasks/administer-cluster/manage-resources/),
|
||||
[DNS autoscaling](/docs/tasks/administer-cluster/dns-horizontal-autoscaling/)
|
||||
and [service accounts](/docs/reference/access-authn-authz/service-accounts-admin/).
|
||||
|
|
Loading…
Reference in New Issue