mirror of https://github.com/kubernetes/kops.git
parent
186e23b5bf
commit
5e216ca57a
|
@ -0,0 +1,51 @@
|
||||||
|
# Backing up etcd
|
||||||
|
|
||||||
|
Kubernetes is relying on etcd for state storage. More details about the usage
|
||||||
|
can be found [here](https://kubernetes.io/docs/admin/etcd/) and
|
||||||
|
[here](https://coreos.com/etcd/docs/2.3.7/index.html).
|
||||||
|
|
||||||
|
## Backup requirement
|
||||||
|
|
||||||
|
A Kubernetes cluster deployed with kops stores the etcd state in two different
|
||||||
|
AWS EBS volumes per master node. One volume is used to store the Kubernetes
|
||||||
|
main data, the other one for events. For a HA master with three nodes this will
|
||||||
|
result in six volumes for etcd data (one in each AZ). An EBS volume is designed
|
||||||
|
to have a [failure rate](https://aws.amazon.com/ebs/details/#AvailabilityandDurability)
|
||||||
|
of 0.1%-0.2% per year.
|
||||||
|
|
||||||
|
## Create volume backups
|
||||||
|
|
||||||
|
Kubernetes does currently not provide any option to do regular backups of etcd
|
||||||
|
out of the box.
|
||||||
|
|
||||||
|
Therefore we have to either manually backup the etcd volumes regularly or use
|
||||||
|
other AWS services to do this in a automated, scheduled way. You can for example
|
||||||
|
use CloudWatch to trigger an AWS Lamda with a defined schedule (e.g. once per
|
||||||
|
hour). The Lamda will then create a new snapshot of all etcd volumes. A complete
|
||||||
|
guide on how to setup automated snapshots can be found [here](https://serverlesscode.com/post/lambda-schedule-ebs-snapshot-backups/).
|
||||||
|
|
||||||
|
Note: this is one of many examples on how to do scheduled snapshots.
|
||||||
|
|
||||||
|
## Restore volume backups
|
||||||
|
|
||||||
|
In case the Kubernetes cluster fails in a way that too many master nodes can't
|
||||||
|
access their etcd volumes it is impossible to get a etcd quorum.
|
||||||
|
|
||||||
|
In this case it is now possible to restore the volume from a snapshot we created
|
||||||
|
earlier. Details about creating a volume from a snaphot can be found in the
|
||||||
|
[AWS documentation](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-restoring-volume.html).
|
||||||
|
|
||||||
|
Kubernetes uses protokube to identify the right volumes for etcd. Therefore it
|
||||||
|
is important to tag the EBS volumes with the correct tags after restoring them
|
||||||
|
from a EBS snapshot.
|
||||||
|
|
||||||
|
protokube will look for the following tags:
|
||||||
|
|
||||||
|
* `KubernetesCluster` containing the cluster name (e.g. `k8s.mycompany.tld`)
|
||||||
|
* `Name` containing the volume name (e.g. `eu-central-1a.etcd-main.k8s.mycompany.tld`)
|
||||||
|
* `k8s.io/etcd/main` containg the availability zone of the volume (e.g. `eu-central-1a/eu-central-1a`)
|
||||||
|
* `k8s.io/role/master` with the value `1`
|
||||||
|
|
||||||
|
After fully restoring the volume ensure that the old volume is no longer there,
|
||||||
|
or you've removed the tags from the old volume. After restarting the master node
|
||||||
|
Kubernetes should pick up the new volume and start running again.
|
Loading…
Reference in New Issue