kubeadm-join-master2

This commit is contained in:
fabriziopandini 2018-07-13 13:25:45 +02:00
parent 74d88f25ff
commit e7520a4028
3 changed files with 436 additions and 449 deletions

View File

@ -1 +1 @@
14
15

View File

@ -0,0 +1,435 @@
# kubeadm join --master workflow
## Metadata
```yaml
---
kep-number: 15
title: kubeadm join --master workflow
status: accepted
authors:
- "@fabriziopandini"
owning-sig: sig-cluster-lifecycle
reviewers:
- "@chuckha”
- "@detiber"
- "@luxas"
approvers:
- "@luxas"
- "@timothysc"
editor:
- "@fabriziopandini"
creation-date: 2018-01-28
last-updated: 2018-06-29
see-also:
- KEP 0004
```
## Table of Contents
<!-- TOC -->
- [kubeadm join --master workflow](#kubeadm-join---master-workflow)
- [Metadata](#metadata)
- [Table of Contents](#table-of-contents)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-goals](#non-goals)
- [Challenges and Open Questions](#challenges-and-open-questions)
- [Proposal](#proposal)
- [User Stories](#user-stories)
- [Create a cluster with more than one master nodes (static workflow)](#create-a-cluster-with-more-than-one-master-nodes-static-workflow)
- [Add a new master node (dynamic workflow)](#add-a-new-master-node-dynamic-workflow)
- [Implementation Details](#implementation-details)
- [Initialize the Kubernetes cluster](#initialize-the-kubernetes-cluster)
- [Preparing for execution of kubeadm join --master](#preparing-for-execution-of-kubeadm-join---master)
- [The kubeadm join --master workflow](#the-kubeadm-join---master-workflow)
- [dynamic workflow (advertise-address == `controlplaneAddress`)](#dynamic-workflow-advertise-address--controlplaneaddress)
- [Static workflow (advertise-address != `controlplaneAddress`)](#static-workflow-advertise-address--controlplaneaddress)
- [Strategies for deploying control plane components](#strategies-for-deploying-control-plane-components)
- [Strategies for distributing cluster certificates](#strategies-for-distributing-cluster-certificates)
- [`kubeadm upgrade` for HA clusters](#kubeadm-upgrade-for-ha-clusters)
- [Graduation Criteria](#graduation-criteria)
- [Implementation History](#implementation-history)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
<!-- /TOC -->
## Summary
We are extending the kubeadm distinctive `init` and `join` workflow, introducing the
capability to add more than one master node to an existing cluster by means of the
new `kubeadm join --master` option (in alpha release the flag will be named --experimental-master)
As a consequence, kubeadm will provide a best-practice, “fast path” for creating a
minimum viable, conformant Kubernetes cluster with one or more master nodes and
zero or more worker nodes; as better detailed in following paragraphs, please note that
this proposal doesn't solve every possible use case or even the full end-to-end flow automatically.
## Motivation
Support for high availability is one of the most requested features for kubeadm.
Even if, as of today, there is already the possibility to create an HA cluster
using kubeadm in combination with some scripts and/or automation tools (e.g.
[this](https://kubernetes.io/docs/setup/independent/high-availability/)), this KEP was
designed with the objective to introduce an upstream simple and reliable solution for
achieving the same goal.
Such solution will provide a consistent and repeatable base for implementing additional
capabilities like e.g. kubeadm upgrade for HA clusters.
### Goals
- "Divide and conquer”
This proposal - at least in its initial release - does not address all the possible
user stories for creating an highly available Kubernetes cluster, but instead
focuses on:
- Defining a generic and extensible flow for bootstrapping a cluster with multiple masters,
the `kubeadm join --master` workflow.
- Providing a solution *only* for well defined user stories. see
[User Stories](#user-stories) and [Non-goals](#non-goals).
- Enable higher-level tools integration
We expect higher-level and tooling will leverage on kubeadm for creating HA clusters;
accordingly, the `kubeadm join --master` workflow should provide support for
the following operational practices used by higher level tools:
- Parallel node creation
Higher-level tools could create nodes in parallel (both masters and workers)
for reducing the overall cluster startup time.
`kubeadm join --master` should support natively this practice without requiring
the implementation of any synchronization mechanics by higher-level tools.
- Provide support both for dynamic and static bootstrap flow
At the time a user is running `kubeadm init`, they might not know what
the cluster setup will look like eventually. For instance, the user may start with
only one master + n nodes, and then add further master nodes with `kubeadm join --master`
or add more worker nodes with `kubeadm join` (in any order). This kind of workflow, where the
user doesnt know in advance the final layout of the control plane instances, into this
document is referred as “dynamic bootstrap workflow”.
Nevertheless, kubeadm should support also more “static bootstrap flow”, where a user knows
in advance the target layout of the controlplane instances (the number, the name and the IP
of master nodes).
- Support different etcd deployment scenarios, and more specifically run master nodes components
and the etcd cluster on the same machines (stacked control plane nodes) or run the etcd
cluster on dedicated machines.
### Non-goals
- Graduating an existing node to master.
The nodes must be created as a master or as workers and then are supposed to stick to the assigned role
for their entire life cycle.
- This proposal doesn't include a solution for etcd cluster management (but nothing in this proposal should
prevent to address this in future).
- This proposal doesn't include a solution for API server load balancing (Nothing in this proposal
should prevent users from choosing their preferred solution for API server load balancing).
- This proposal doesn't address the ongoing discussion about kubeadm self-hosting; in light of
divide and conquer goal stated before, it is not planned to provide support for self-hosted clusters
neither in the initial proposal nor in the foreseeable future (but nothing in this proposal should
explicitly prevent to reconsider this in future as well).
- This proposal doesn't provide an automated solution for transferring the CA key and other required
certs from one master to the other. More specifically, this proposal doesn't address the ongoing
discussion about storage of kubeadm TLS assets in secrets and it it is not planned
to provide support for clusters with TLS stored in secrets (but nothing in this
proposal should explicitly prevent to reconsider this in future).
- Nothing in this proposal should prevent practices that exist today.
### Challenges and Open Questions
- Keep the UX simple.
- _What are the acceptable trade-offs between the need to have a clean and simple
UX and the variety/complexity of possible kubernetes HA deployments?_
- Create a cluster without knowing its final layout
Supporting a dynamic workflow implies that some information about the cluster are
not available at init time, like e.g. the number of master nodes, the IP of
master nodes etc. etc.
- _How to configure a Kubernetes cluster in order to easily adapt to future change
of its own controlplane layout like e.g. add a master node, remove a master node?_
- _What are the "pivotal" cluster settings that must be defined before initialising
the cluster?_
- _How to combine into a single UX support for both static and dynamic bootstrap
workflows?_
- Kubeadm limited scope of action
- Kubeadm binary can execute actions _only_ on the machine where it is running
e.g. it is not possible to execute actions on other nodes, to copy files across
nodes etc.
- During the join workflow, kubeadm can access the cluster _only_ using identities
with limited grants, namely `system:unauthenticated` or `system:node-bootstrapper`.
- Upgradability
- How to setup an high available cluster in order to simplify the execution
of cluster version upgrades, both manually or with the support of `kubeadm upgrade`?_
## Proposal
### User Stories
#### Create a cluster with more than one master nodes (static workflow)
As a kubernetes administrator, I want to create a Kubernetes cluster with more than one
master nodes*, of which I know in advance the name and the IP.
\* A new "master node" is a new kubernetes node with
`node-role.kubernetes.io/master=""` label and
`node-role.kubernetes.io/master:NoSchedule` taint; a new instance of control plane
components will be deployed on the new master node.
As described in goals/non goals, in this first release of the proposal
creating a new master node doesn't trigger the creation of a new etcd member on the
same machine.
#### Add a new master node (dynamic workflow)
As a kubernetes administrator, (_at any time_) I want to add a new master node* to an existing
Kubernetes cluster.
### Implementation Details
#### Initialize the Kubernetes cluster
As of today, a Kubernetes cluster should be initialized by running `kubeadm init` on a
first master, afterward referred as the bootstrap master.
in order to support the `kubeadm join --master` workflow a new Kubernetes cluster is
expected to satisfy following conditions :
- The cluster must have a stable `controlplaneAddress` endpoint (aka the IP/DNS of the
external load balancer)
- The cluster must use an external etcd.
All the above conditions/settings could be set by passing a configuration file to `kubeadm init`.
#### Preparing for execution of kubeadm join --master
Before invoking `kubeadm join --master`, the user/higher level tools
should copy control plane certificates from an existing master node, e.g. bootstrap master
> NB. kubeadm is limited to execute actions *only*
> in the machine where it is running, so it is not possible to copy automatically
> certificates from remote locations.
Please note that strictly speaking only ca, front-proxy-ca certificate and and service account key pair
are required to be equal among all masters. Accordingly:
- `kubeadm join --master` will check for the mandatory certificates and fail fast if
they are missing
- given the required certificates exists, if some/all of the other certificates are provided
by the user as well, `kubeadm join --master` will use them without further checks.
- If any other certificates are missing, `kubeadm join --master` will create them.
> see "Strategies for distributing cluster certificates" paragraph for
> additional info about this step.
#### The kubeadm join --master workflow
The `kubeadm join --master` workflow will be implemented as an extension of the
existing `kubeadm join` flow.
`kubeadm join --master` will accept an additional parameter, that is the apiserver advertise
address of the joining node; as details in following paragraphs, the value assigned to
this parameter depends on the user choice between a dynamic bootstrap workflow or a static
bootstrap workflow.
The updated join workflow will be the following:
1. Discovery cluster info [No changes to this step]
> NB This step waits for a first instance of the kube-apiserver to become ready
> (the bootstrap master); And thus it acts as embedded mechanism for handling the sequence
> `kubeadm init` and `kubeadm join` actions in case of parallel node creation.
2. Executes the kubelet TLS bootstrap process [No changes to this step]:
3. In case of `join --master` [New step]
1. Using the bootstrap token as identity, read the `kubeadm-config` configMap
in `kube-system` namespace.
> This requires to grant access to the above configMap for
> `system:bootstrappers` group.
2. Check if the cluster is ready for joining a new master node:
a. Check if the cluster has a stable `controlplaneAddress`
a. Check if the cluster uses an external etcd
a. Checks if the mandatory certificates exists on the file system
3. Prepare the node for joining as a master node:
a. Create missing certificates (in any).
> please note that by creating missing certificates kubeadm can adapt seamlessly
> to a dynamic workflow or to a static workflow (and to apiserver advertise address
> of the joining node). see following paragraphs for more details for additional info.
a. In case of control plane deployed as static pods, create related kubeconfig files
and static pod manifests.
> see "Strategies for deploying control plane components" paragraph
> for additional info about this step.
4. Create the admin.conf kubeconfig file
> This operation creates an additional root certificate that enables management of the cluster
> from the joining node and allows a simple and clean UX for the final steps of this workflow
> (similar to the what happen for `kubeadm init`).
> However, it is important to notice that this certificate should be treated securely
> for avoiding to compromise the cluster.
5. Apply master taint and label to the node.
6. Update the `kubeadm-config` configMap with the information about the new master node
#### dynamic workflow (advertise-address == `controlplaneAddress`)
There are many ways to configure an highly available cluster.
Among them, the approach best suited for a dynamic bootstrap workflow requires the
user to set the `--apiserver-advertise-address` of each master, including the bootstrap master
itself, equal to the `controlplaneAddress` endpoint provided during kubeadm init
(the IP/DNS of the external load balancer).
By using the same advertise address for all the IP masters, `kubeadm init` can create
a unique API server serving certificate that could be shared across many masters nodes;
no changes will be required to this certificate when adding/removing master nodes.
Please note that:
- if the user is not planning to distribute the apiserver serving certificate among masters,
kubeadm will generate a new apiserver serving certificate “almost equal” to the certificate
created on the bootstrap master (it differs only for the domain name of the joining master)
#### Static workflow (advertise-address != `controlplaneAddress`)
In case of a static bootstrap workflow the final layout of the controlplane - the number, the
name and the IP of master nodes - is know in advance.
Given such information, the user can choose a different approach where each master has a
specific apiserver advertise address different from the `controlplaneAddress`.
Please note that:
- if the user is not planning to distribute the apiserver certificate among masters, kubeadm
will generate a new apiserver serving certificate with the required SANS
- if the user is planning to distribute the apiserver certificate among masters, the
operator is required to provide during `kubeadm init` the list of masters/the list of IP
addresses for all the masters as alternative names for the API servers certificate, thus
allowing the proper functioning of all the API server instances that will join
#### Strategies for deploying control plane components
As of today kubeadm supports two solutions for deploying control plane components:
1. Control plane deployed as static pods (current kubeadm default)
2. Self-hosted control plane (currently alpha)
The proposed solution for case 1. "Control plane deployed as static pods", assumes
that the `kubeadm join --master` flow will take care of creating required kubeconfig
files and required static pod manifests.
As stated above, supporting for Self-hosted control plane is non goal for this
proposal.
#### Strategies for distributing cluster certificates
As of today kubeadm supports two solutions for storing cluster certificates:
1. Cluster certificates stored on file system (current kubeadm default)
2. Cluster certificates stored in secrets (currently alpha)
The proposed solution for case 1. "Cluster certificates stored on file system",
requires the user/the higher level tools to execute an additional action _before_
invoking `kubeadm join --master`.
More specifically, in case of cluster with "cluster certificates stored on file
system", before invoking `kubeadm join --master`, the user/higher level tools
should copy control plane certificates from an existing master node, e.g. bootstrap master
> NB. kubeadm is limited to execute actions *only*
in the machine where it is running, so it is not possible to copy automatically
certificates from remote locations.
Then, the `kubeadm join --master` flow will take care of checking certificates
existence and conformance.
As stated above, supporting for Cluster certificates stored in secrets is a non goal
for this proposal.
#### `kubeadm upgrade` for HA clusters
Nothing in this proposal prevents implementation of `kubeadm upgrade` for HA cluster.
Further detail will be provided in a subsequent release of this KEP when all the detail
of the `v1beta1` release of kubeadm api will be available (including a proper modeling
of a multi master cluster).
## Graduation Criteria
- To create a periodic E2E test that bootstraps an HA cluster with kubeadm
and exercise the static bootstrap workflow
- To create a periodic E2E test that bootstraps an HA cluster with kubeadm
and exercise the dynamic bootstrap workflow
- To ensure upgradability of HA clusters (possibly with another E2E test)
- To document the kubeadm support for HA in kubernetes.io
## Implementation History
- original HA proposals [#1](https://goo.gl/QNtj5T) and [#2](https://goo.gl/C8V8PV)
- merged [Kubeadm HA design doc](https://goo.gl/QpD5h8)
- HA prototype [demo](https://goo.gl/2WLUUc) and [notes](https://goo.gl/NmTahy)
- [PR #58261](https://github.com/kubernetes/kubernetes/pull/58261) with the showcase implementation of the first release of this KEP
## Drawbacks
The kubeadm join --master workflow requires that some condition are satisfied at `kubeadm init` time,
that is use a `controlplaneAddress` and use an external etcd.
Strictly speaking, that's mean that the `kubeadm join --master` defined in this proposal supports
a dynamic workflow _only_ in some cases.
## Alternatives
1) Execute `kubeadm init` on many nodes
The approach based on execution of `kubeadm init` on each master was considered as well,
but not chosen because it seems to have several drawbacks:
- There is no real control on parameters passed to `kubeadm init` executed on secondary masters,
and this might lead to unpredictable inconsistent configurations.
- The init sequence for secondary master won't go through the TLS bootstrap process,
and this might be perceived as a security concern.
- The init sequence executes a lot of steps which are un-necessary on a secondary master;
now those steps are mostly idempotent, so basically now no harm is done by executing
them two or three times. Nevertheless to maintain this contract in future could be complex.
Additionally, by having a separated `kubeadm join --master` workflow instead of a single `kubeadm init`
workflow we can provide better support for:
- Steps that should be done in a slightly different way on a secondary master with respect
to the bootstrap master (e.g. updating the kubeadm-config map adding info about the new master instead
of creating a new configMap from scratch).
- Checking that the cluster/the kubeadm-config is properly configured for multi masters
- Blocking users trying to create multi masters with configurations we don't want to support as a sig
(e.g. HA with self-hosted control plane)

View File

@ -1,448 +0,0 @@
# kubeadm join --master workflow
## Metadata
```yaml
---
kep-number: draft-20180130
title: kubeadm join --master workflow
status: accepted
authors:
- "@fabriziopandini"
owning-sig: sig-cluster-lifecycle
reviewers:
- "@errordeveloper"
- "@jamiehannaford"
approvers:
- "@luxas"
- "@timothysc"
- "@roberthbailey"
editor:
- "@fabriziopandini"
creation-date: 2018-01-28
last-updated: 2018-01-28
see-also:
- KEP 0004
```
## Table of Contents
* [kubeadm join --master workflow](#kubeadm-join---master-workflow)
* [Metadata](#metadata)
* [Table of Contents](#table-of-contents)
* [Summary](#summary)
* [Motivation](#motivation)
* [Goals](#goals)
* [Non-goals](#non-goals)
* [Challenges and Open Questions](#challenges-and-open-questions)
* [Proposal](#proposal)
* [User Stories](#user-stories)
* [Add a new master node](#add-a-new-master-node)
* [Implementation Details](#implementation-details)
* [advertise-address = IP/DNS of the external load balancer](#advertise-address--ipdns-of-the-external-load-balancer)
* [kubeadm init --feature-gates=HighAvailability=true](#kubeadm-init---feature-gateshighavailabilitytrue)
* [kubeadm join --master workflow](#kubeadm-join---master-workflow-1)
* [Strategies for deploying control plane components](#strategies-for-deploying-control-plane-components)
* [Strategies for distributing cluster certificates](#strategies-for-distributing-cluster-certificates)
* [Graduation Criteria](#graduation-criteria)
* [Implementation History](#implementation-history)
* [Drawbacks](#drawbacks)
* [Alternatives](#alternatives)
## Summary
We are extending the kubeadm distinctive `init` and `join` workflow, introducing the
capability to add more than one master node to an existing cluster by means of the
new `kubeadm join --master` option.
As a consequence, kubeadm will provide a best-practice, “fast path” for creating a
minimum viable, conformant Kubernetes cluster with one or more master nodes and
zero or more worker nodes.
## Motivation
Support for high availability is one of the most requested features for kubeadm.
Even if, as of today, there is already the possibility to create an HA cluster
using kubeadm in combination with some scripts and/or automation tools (e.g.
[this](https://kubernetes.io/docs/setup/independent/high-availability/)), this KEP was
designed with the objective to introduce an upstream simple and reliable solution for
achieving the same goal.
### Goals
* "Divide and conquer”
This proposal - at least in its initial release - does not address all the possible
user stories for creating an highly available Kubernetes cluster, but instead
focuses on:
* Defining a generic and extensible flow for bootstrapping an HA cluster, the
`kubeadm join --master` workflow.
* Providing a solution *only* for one, well defined user story. see
[User Stories](#user-stories) and [Non-goals](#non-goals).
* Provide support for a dynamic bootstrap flow
At the time a user is running `kubeadm init`, s/he/an operator might not know what
the cluster setup will look like eventually. For instance, the user may start with
only one master + n nodes, and then add further master nodes with `kubeadm join --master`
or add more worker nodes with `kubeadm join` (in any order).
* Enable higher-level tools integration
We expect higher-level and more tailored tooling to be built on top of kubeadm,
and ideally, using kubeadm as the basis of all deployments will make it easier
to create conformant cluster.
Accordingly, the `kubeadm join --master` workflow should provide support for
the following operational practices used by higher level tools:
* Parallel node creation
Higher-level tools could create nodes in parallel (both masters and workers)
for reducing the overall cluster startup time.
`kubeadm join --master` should support natively this practice without requiring
the implementation of any synchronization mechanics by higher-level tools.
* Replace reconciliation strategies
Especially in case of cloud deployments, higher-level automation tools could
decide for any reason to replace existing nodes with new ones (instead of apply
changes in-place to existing nodes).
`kubeadm join --master` will support this practice by making easier to replace
existing master nodes with new ones.
### Non-goals
* By design, kubeadm cares only about bootstrapping, not about provisioning machines.
Likewise, installing various nice-to-have addons, like the Kubernetes Dashboard,
monitoring solutions, and cloud-specific addons, is not in scope.
* This proposal doesn't include a solution for etcd cluster management\*.
* Nothing in this proposal should prevent users to run master nodes components
and etcd on the same machines; however users should be aware that this will
introduce limitations for strategies like parallel node creation and In-place
vs. Replace reconciliation.
* Nothing in this proposal should prevent kubeadm to implement in future a
solution for provisioning an etcd cluster based on static pods/pods.
* This proposal doesn't include a solution for API server load balancing.
* Nothing in this proposal should prevent users from choosing their preferred
solution for API server load balancing.
* Nothing in this proposal should prevent practices that exist today.
* Nothing in this proposal should prevent user from pre-provision TLS assets
before running `kubeadm init` or `kubeadm join --master`.
\* At the time of writing, the CoreOS recommended approach for etcd is to run
the etcd cluster outside kubernetes (see discussion in [kubeadm office hours](https://goo.gl/fjyeqo)).
### Challenges and Open Questions
* Keep the UX simple.
* _What are the acceptable trade-offs between the need to have a clean and simple
UX and the complexity of the following challenges and open questions?_
* Create a cluster without knowing its final layout
Supporting a dynamic workflow implies that some information about the cluster are
not available at init time, like e.g. the number of master nodes, the ip of
master nodes etc. etc.
* _How to configure a Kubernetes cluster in order to easily adapt to future change
of its own layout like e.g. add a master node, remove a master node?_
* _What are the "pivotal" cluster settings that must be defined before initialising
the cluster?_
* _What are the mandatory conditions to be verified when executing `kubeadm init`
to allow/not allow the execution of `kubeadm join --master` in future?_
* Kubeadm limited scope of action
* Kubeadm binary can execute actions _only_ on the machine where it is running
e.g. it is not possible to execute actions on other nodes, to copy files across
nodes etc.
* During the join workflow, kubeadm can access the cluster _only_ using identities
with limited grants, `system:unauthenticated` or `system:node-bootstrapper`.
* Dependencies graduation
The solution for `kubeadm join --master` will rely on a set of dependencies/other
features which are still in the process for graduating to GA like e.g. dynamic
kubelet configuration, self-hosting, component config.
* _When `kubeadm join --master` should rely entirely on dependencies/features
still under graduation vs provide compatibility with older/less convenient but
more consolidated approaches?_
* _Should we support `kubeadm join --master` for cluster with a control plane
deployed as static pods? What about cluster with a self-hosted
control plane?_
* _Should we support `kubeadm join --master` only for cluster storing
cluster-certificates on file system? What about cluster
storing certificates in secrets?_
* Upgradability
* How to setup an high available cluster in order to simplify the execution
of cluster version upgrades, both manually or with the support of `kubeadm upgrade`?_
## Proposal
### User Stories
#### Add a new master node
As a kubernetes administrator, I want to run `kubeadm join --master` for adding
a new master node* to an existing Kubernetes cluster**, so that the cluster become
more resilient to failures of the existing master nodes (high availability).
\* A new "master node" is a new kubernetes node with
`node-role.kubernetes.io/master=""` label and
`node-role.kubernetes.io/master:NoSchedule` taint; a new instance of control plane
components will be deployed on the new master node
> NB. In this first release of the proposal creating a new master node doesn't
trigger the creation of a new etcd member on the same machine.
\*\* In this first release of the proposal, `kubeadm join --master` could be
executed _only_ on Kubernetes cluster compliant with following conditions:
* The cluster was initialized with `kubeadm init`.
* The cluster was initialized with `--feature-gates=HighAvailability=true`.
* The cluster uses an external etcd.
* An external load balancer was provisioned and the IP/DNS of the external
load balancer is used as advertise-address for the kube-api server.
### Implementation Details
#### advertise-address = IP/DNS of the external load balancer
There are many ways to configure an highly available cluster.
After prototyping and various discussions in
[kubeadm office hours](https://youtu.be/HcvVi8O_ZGY), it was agreed to implement
the approach that sets the `--advertise-address` equal to the IP/DNS of the
external load balancer, without assigning dedicated `--advertise-address` IPs
for each master nodes.
By excluding the IP of master nodes, kubeadm can create a unique API server
serving certificate, and share this certificate across many masters nodes;
no changes will be required to this certificate when adding/removing master nodes.
Such properties make this approach best suited for the initial set up of
the desired `kubeadm join --master` dynamic workflow.
> Please note that in this scenario the kubernetes service will always resolve
to the IP/DNS of the external load balancer, instead of resolving to the list
of master IPs, but this fact was considered an acceptable trade-off at this stage.
> It is expected to add support also for different HA configurations in future releases
of this KEP.
#### kubeadm init --feature-gates=HighAvailability=true
When executing `kubeadm join --master`, due to current kubeadm limitations, only
few information about the cluster/about other master nodes are available.
As a consequence, this proposal delegates to the initial `kubeadm init` - when
executed with `--feature-gates=HighAvailability=true` - all the controls about
the compliance of the cluster with the supported user story:
* The cluster uses an external etcd.
* An external load balancer is provisioned and the IP/DNS of the external load balancer is used as advertise-address.
#### kubeadm join --master workflow
The `kubeadm join --master` target workflow is an extension of the
existing `kubeadm join` flow:
1. Discovery cluster info [No changes to this step]
Access the `cluster-info` configMap in `kube-public` namespace (or read
the same information provided in a file).
> This step waits for a first instance of the kube-apiserver to become ready;
such wait cycle acts as embedded mechanism for handling the sequence
`kubeadm init` and `kubeadm join` in case of parallel node creation.
2. In case of `join --master` [New step]
1. Using the bootstrap token as identity, read the `kubeadm-config` configMap
in `kube-system` namespace.
> This requires to grant access to the above configMap for
`system:bootstrappers` group (or to provide the same information
provided in a file like in 1.).
2. Check if the cluster is ready for joining a new master node:
a. Check if the cluster was created with `--feature-gates=HighAvailability=true`.
> We assume that all the necessary conditions where already checked
during `kubeadm init`:
> * The cluster uses an external etcd.
> * An external load balancer is provisioned and the IP/DNS of the external
load balancer is used as advertise-address.
b. In case of cluster certificates stored on file system, check if the
expected certificates exists.
> see "Strategies for distributing cluster certificates" paragraph for
additional info about this step.
3. Prepare the node for joining as a master node:
a. In case of control plane deployed as static pods, create kubeconfig files
and static pod manifests for control plane components.
> see "Strategies for deploying control plane components" paragraph
for additional info about this step.
4. Create the admin.conf kubeconfig file
3. Executes the TLS bootstrap process, including [No changes to this step]:
1. Start kubelet using the bootstrap token as identity
2. Request a certificate for the node - with the node identity - and retrieves
it after it is automatically approved
3. Restart kubelet with the node identity
4. Eventually, apply the kubelet dynamic configuration
4. In case of `join --master` [New step]
1. Apply master taint and label to the node.
> This action is executed using the admin.conf identity created above;
>
> This action triggers the deployment of master components in case of
self-hosted control plane
#### Strategies for deploying control plane components
As of today kubeadm supports two solutions for deploying control plane components:
1. Control plane deployed as static pods (current kubeadm default)
2. Self-hosted control plane in case of `--feature-gates=SelfHosting=true`
"Self-hosted control plane" is a solution that we expect - *in the long term* -
will become mainstream, because it simplifies both deployment and upgrade of control
plane components due to the fact that Kubernetes itself will take care of deploying
corresponding pods on nodes.
Unfortunately, at the time of writing it is unknown when this feature will graduate
to beta/GA or when this feature will become the new kubeadm default; as a consequence,
this proposal assumes that is still required to provide a solution both for case 1.
and case 2.
The proposed solution for case 1. "Control plane deployed as static pods", assumes
that the `kubeadm join --master` flow will take care of creating required kubeconfig
files and required static pod manifests.
Case 2. "Self-hosted control plane," as described above, does not requires any
additional steps to be implemented in the `kubeadm join --master` flow.
#### Strategies for distributing cluster certificates
As of today kubeadm supports two solutions for storing cluster certificates:
1. Cluster certificates stored on file system in case of:
* Control plane deployed as static pods (current kubeadm default)
* Self-hosted control plane in case of `--feature-gates=SelfHosting=true`
2. Cluster certificates stored in secrets in case of:
* Self-hosted control plane + secrets in certs in case of
`--feature-gates=SelfHosting=true,StoreCertsInSecrets=true`
"Storing cluster certificates in secrets" is a solution that we expect - *in the
long term* - will become mainstream, because it simplifies certificates distribution
and also certificate rotation, due to the fact that Kubernetes itself will take
care of distributing certs on nodes.
Unfortunately, at the time of writing it is unknown when this feature will graduate
to beta/GA or when this feature will become the new kubeadm default; as a
consequence, this proposal assumes it is required to provide a solution for both
for case 1 and case 2.
The proposed solution for case 1. "Cluster certificates stored on file system",
requires the user/the higher level tools to execute an additional action _before_
invoking `kubeadm join --master` (NB. kubeadm is limited to execute actions *only*
in the machine where it is running, so it is not possible to copy automatically
certificates from remote locations).
More specifically, in case of cluster with "cluster certificates stored on file
system", before invoking `kubeadm join --master`, the user/higher level tools
should copy control plane certificates from an existing node, e.g. the node
where `kubeadm init` was run, to the joining node.
Then, the `kubeadm join --master` flow will take care of checking certificates
existence and conformance.
Case 2. "Cluster certificates stored in secrets", as described above, does not
requires any additional steps to be implemented in the `kubeadm join --master`
flow .
## Graduation Criteria
* To create a periodic E2E test that bootstraps an HA cluster with kubeadm
and exercise the dynamic bootstrap workflow
* To ensure upgradability of HA clusters (possibly with another E2E test)
* To document the kubeadm support for HA in kubernetes.io
## Implementation History
* original HA proposals [#1](https://goo.gl/QNtj5T) and [#2](https://goo.gl/C8V8PV)
* merged [Kubeadm HA design doc](https://goo.gl/QpD5h8)
* HA prototype [demo](https://goo.gl/2WLUUc) and [notes](https://goo.gl/NmTahy)
* [PR #58261](https://github.com/kubernetes/kubernetes/pull/58261)
## Drawbacks
This proposal provides support for a single, well defined HA scenario.
While this is considered a sustainable approach to the complexity of HA in Kubernetes,
the limited scope of this proposal could be negatively perceived by final users.
## Alternatives
1) Execute `kubeadm init` on many nodes
The approach based on execution of `kubeadm init` on each master was considered as well,
but not chosen because it seems to have several draw backs:
* There is no real control on parameters passed to `kubeadm init` executed on secondary masters,
and this can lead to unpredictable inconsistent configurations.
* The init sequence for secondary master won't go through the TLS boostrap process,
and this can be perceived security concern.
* The init sequence executes a lot of steps which are un-necessary on a secondary master;
now those steps are mostly idempotent, so basically now no harm is done by executing
them two or three times. Nevertheless to mantain this contract in future could be complex.
2) Allow HA configurations with `--advertise-address` equal to the master ip address
(and adding the IP/DNS of the external load balancer as an additional apiServerCertSANs).
After some testing, this option was considered too complex/not
adequate for the initial set up of the desired `kubeadm join --master` dynamic workflow;
this can be better explained by looking at two implementation based on this option:
* [kubernetes the hard way](https://github.com/kelseyhightower/kubernetes-the-hard-way)
uses the IP address of all master nodes for creating a new API server
serving certificate before bootstrapping the cluster, but this approach
can't be used if considering the desired dynamic workflow.
* [Creating HA cluster with kubeadm](https://kubernetes.io/docs/setup/independent/high-availability/)
uses a different API server serving certificate for each master, and this
could increases the complexity of the first implementation because:
* the `kubeadm join --master` flow has to generate different certificates for
each master node.
* self-hosting control plane, should be adapted to mount different certificates
for each master.
* bootstrap check pointing should be designed to checkpoint a different
set of certificates for each master.
* upgrades should be adapted to consider master specific settings