mirror of https://github.com/kubernetes/kops.git
118 lines
5.4 KiB
Markdown
118 lines
5.4 KiB
Markdown
## Kubernetes Bootstrap
|
|
|
|
This is an overview of how a Kubernetes cluster comes up, when using kOps.
|
|
|
|
## From spec to complete configuration
|
|
|
|
The kOps tool itself takes the (minimal) spec of a cluster that the user specifies,
|
|
and computes a complete configuration, setting defaults where values are not specified,
|
|
and deriving appropriate dependencies. The "complete" specification includes the set
|
|
of all flags that will be passed to all components. All decisions about how to install the
|
|
cluster are made at this stage, and thus every decision can in theory be changed if the user
|
|
specifies a value in the spec.
|
|
|
|
This complete specification is set in the LaunchTemplate for the AutoScaling Group (on AWS),
|
|
or the Managed Instance Group (on GCE).
|
|
|
|
On both AWS & GCE, everything (nodes & masters) runs in an ASG/MIG; this means that failures
|
|
(or the user) can terminate machines and the system will self-heal.
|
|
|
|
## nodeup: from image to kubelet
|
|
|
|
nodeup is the component that installs packages and sets up the OS, sufficiently for
|
|
Kubelet. The core requirements are:
|
|
|
|
* Docker must be installed. nodeup will install Docker 1.13.1, the version of Docker tested with Kubernetes 1.8
|
|
* Kubelet, which is installed a systemd service
|
|
|
|
In addition, nodeup installs:
|
|
|
|
* Protokube, which is a kops-specific component
|
|
|
|
## /etc/kubernetes/manifests
|
|
|
|
kubelet starts pods as controlled by the files in /etc/kubernetes/manifests. These files are created
|
|
by nodeup and protokube (ideally all by protokube, but currently split between the two).
|
|
|
|
These pods are declared using the standard k8s manifests, just as if they were stored in the API.
|
|
But these are used to break the circular dependency for the bring-up of our core components, such
|
|
as etcd & kube-apiserver.
|
|
|
|
On masters:
|
|
|
|
* kube-apiserver
|
|
* kube-controller-manager (which runs miscellaneous controllers)
|
|
* kube-scheduler (which assigns pods to nodes)
|
|
* etcd (this is actually created by protokube though)
|
|
* dns-controller
|
|
|
|
On nodes:
|
|
|
|
* kube-proxy (which configures iptables so that the k8s-network will work)
|
|
|
|
It is possible to add custom static pods by using `fileAssets` in the
|
|
cluster spec. This might be useful for any custom bootstraping that
|
|
doesn't fit into `additionalUserData` or `hooks`.
|
|
|
|
## kubelet start
|
|
|
|
Kubelet starts up, starts (and restarts) all the containers in /etc/kubernetes/manifests.
|
|
|
|
It also tries to contact the API server (which the master kubelet will itself eventually start),
|
|
register the node. Once a node is registered, kube-controller-manager will allocate it a PodCIDR,
|
|
which is an allocation of the k8s-network IP range. kube-controller-manager updates the node, setting
|
|
the PodCIDR field. Once kubelet sees this allocation, it will set up the
|
|
local bridge with this CIDR, which allows docker to start. Before this happens, only pods
|
|
that have hostNetwork will work - so all the "core" containers run with hostNetwork=true.
|
|
|
|
## api-server bringup
|
|
|
|
APIServer also listens on the HTTPS port (443) on all interfaces. This is a secured endpoint,
|
|
and requires valid authentication/authorization to use it. This is the endpoint that node kubelets
|
|
will reach, and also that end-users will reach.
|
|
|
|
kOps uses DNS to allow nodes and end-users to discover the api-server. The apiserver pod manifest (in
|
|
/etc/kubernetes/manifests) includes annotations that will cause the dns-controller to create the
|
|
records. It creates `api.internal.mycluster.com` for use inside the cluster (using InternalIP addresses),
|
|
and it creates `api.mycluster.com` for use outside the cluster (using ExternalIP addresses).
|
|
|
|
## etcd bringup
|
|
|
|
etcd is where we have put all of our synchronization logic, so it is more complicated than most other pieces,
|
|
and we must be really careful when bringing it up.
|
|
|
|
kOps follows CoreOS's recommend procedure for [bring-up of etcd on clouds](https://github.com/coreos/etcd/issues/5418):
|
|
|
|
* We have one EBS volume for each etcd cluster member (in different nodes)
|
|
* We attach the EBS volume to a master, and bring up etcd on that master
|
|
* We set up DNS names pointing to the etcd process
|
|
* We set up etcd with a static cluster, with those DNS names
|
|
|
|
Because the data is persistent and the cluster membership is also a static set of DNS names, this
|
|
means we don't need to manage etcd directly. We just try to make sure that some master always have
|
|
each volume mounted with etcd running and DNS set correctly. That is the job of protokube.
|
|
|
|
Protokube:
|
|
|
|
* discovers EBS volumes that hold etcd data (using tags)
|
|
* tries to safe_format_and_mount them
|
|
* if successful in mounting the volume, it will write a manifest for etcd into /etc/kubernetes/manifests
|
|
* configures DNS for the etcd nodes (we can't use dns-controller, because the API is not yet up)
|
|
* kubelet then starts and runs etcd
|
|
|
|
## node bringup
|
|
|
|
Most of this has focused on things that happen on the master, but the node bringup is very similar but simplified:
|
|
|
|
* nodeup installs docker & kubelet
|
|
* in /etc/kubernetes/manifests, we have kube-proxy
|
|
|
|
So kubelet will start up, as will kube-proxy. It will try to reach the api-server on the internal DNS name,
|
|
and once the master is up it will succeed. Then:
|
|
|
|
* kubelet creates a Node object representing itself
|
|
* kube-controller-manager sees the node creation and assigns it a PodCIDR
|
|
* kubelet sees the PodCIDR assignment and configures the local docker bridge (cbr0)
|
|
* the node will be marked as Ready, and kube-scheduler will start assigning pods to the node
|
|
* when kubelet sees assigned pods it will run them
|