mirror of https://github.com/kubernetes/kops.git
166 lines
6.1 KiB
Markdown
166 lines
6.1 KiB
Markdown
# Etcd
|
|
|
|
## Backing up etcd
|
|
|
|
Kubernetes is relying on etcd for state storage. More details about the usage
|
|
can be found [here](https://kubernetes.io/docs/admin/etcd/) and
|
|
[here](https://coreos.com/etcd/docs/latest/).
|
|
|
|
### Backup requirement
|
|
|
|
A Kubernetes cluster deployed with kOps stores the etcd state in two different
|
|
AWS EBS volumes per master node. One volume is used to store the Kubernetes
|
|
main data, the other one for events. For a HA master with three nodes this will
|
|
result in six volumes for etcd data (one in each AZ). An EBS volume is designed
|
|
to have a [failure rate](https://aws.amazon.com/ebs/details/#AvailabilityandDurability)
|
|
of 0.1%-0.2% per year.
|
|
|
|
## Taking backups
|
|
|
|
Backups are done periodically and before cluster modifications using [etcd-manager](etcd_administration.md)
|
|
(introduced in kOps 1.12). Backups for both the `main` and `events` etcd clusters
|
|
are stored in object storage (like S3) together with the cluster configuration.
|
|
|
|
By default, backups are taken every 15 min. Hourly backups are kept for 1 week and
|
|
daily backups are kept for 90 days (or 2 years before kOps 1.27), before being automatically removed.
|
|
The retention duration for backups [can be adjusted](../cluster_spec.md#etcd-backups-retention)
|
|
to suit other needs.
|
|
|
|
## Restore backups
|
|
|
|
In case of a disaster situation with etcd (lost data, cluster issues etc.) it's
|
|
possible to do a restore of the etcd cluster using `etcd-manager-ctl`.
|
|
You can download the `etcd-manager-ctl` binary from the [etcd-manager repository](https://github.com/kopeio/etcd-manager/releases).
|
|
It is not necessary to run `etcd-manager-ctl` in your cluster, as long as you have access to cluster state storage (like S3).
|
|
|
|
Please note that this process involves downtime for your masters (and so the api server).
|
|
A restore cannot be undone (unless by restoring again), and you might lose pods, events
|
|
and other resources that were created after the backup.
|
|
|
|
For this example, we assume we have a cluster named `test.my.clusters` in a S3 bucket called `my.clusters`.
|
|
|
|
List the backups that are stored in your state store (note that backup files are different for the `main` and `events` clusters):
|
|
|
|
```
|
|
etcd-manager-ctl --backup-store=s3://my.clusters/test.my.clusters/backups/etcd/main list-backups
|
|
etcd-manager-ctl --backup-store=s3://my.clusters/test.my.clusters/backups/etcd/events list-backups
|
|
```
|
|
|
|
Add a restore command for both clusters:
|
|
|
|
```
|
|
etcd-manager-ctl --backup-store=s3://my.clusters/test.my.clusters/backups/etcd/main restore-backup [main backup dir]
|
|
etcd-manager-ctl --backup-store=s3://my.clusters/test.my.clusters/backups/etcd/events restore-backup [events backup dir]
|
|
```
|
|
|
|
Note that this does not start the restore immediately; you need to restart etcd on all masters.
|
|
You can do this with a `docker stop` or `kill` on the etcd-manager containers on the masters (the container names start with `k8s_etcd-manager_etcd-manager`).
|
|
The etcd-manager containers should restart automatically, and pick up the restore command. You also have the option to roll your masters quickly, but restarting the containers is preferred.
|
|
|
|
A new etcd cluster will be created and the backup will be
|
|
restored onto this new cluster. Please note that this process might take a short while,
|
|
depending on the size of your cluster.
|
|
|
|
You can follow the progress by reading the etcd logs (`/var/log/etcd(-events).log`)
|
|
on the master that is the leader of the cluster (you can find this out by checking the etcd logs on all masters).
|
|
Note that the leader might be different for the `main` and `events` clusters.
|
|
|
|
## Verify master lease consistency
|
|
|
|
[This bug](https://github.com/kubernetes/kubernetes/issues/86812) causes old apiserver leases to get stuck. In order to recover from this you need to remove the leases from etcd directly.
|
|
|
|
To verify if you are affect by this bug, check the endpoints resource of the kubernetes apiserver, like this:
|
|
```
|
|
kubectl get endpoints/kubernetes -o yaml
|
|
```
|
|
|
|
If you see more address than masters, you will need to remove it manually inside the etcd cluster.
|
|
|
|
See [etcd administation](/operations/etcd_administration) how to obtain access to the etcd cluster.
|
|
|
|
Once you have a working etcd client, run the following:
|
|
```
|
|
etcdctl get --prefix --keys-only /registry/masterleases
|
|
```
|
|
|
|
Also you can delete all of the leases in one go...
|
|
```
|
|
etcdctl del --prefix /registry/masterleases/
|
|
```
|
|
|
|
The remaining api servers will immediately recreate their own leases. Check again the above-mentioned endpoint to verify the problem has been solved.
|
|
|
|
Because the state on each of the Nodes may differ from the state in etcd, it is also a good idea to do a rolling-update of the entire cluster:
|
|
|
|
```sh
|
|
kops rolling-update cluster --force --yes
|
|
```
|
|
|
|
For more information and troubleshooting, please check the [etcd-manager documentation](https://github.com/kubernetes-sigs/etcdadm/tree/master/etcd-manager).
|
|
|
|
## Etcd Volume Encryption
|
|
|
|
You must configure etcd volume encryption before bringing up your cluster. You cannot add etcd volume encryption to an already running cluster.
|
|
|
|
### Encrypting Etcd Volumes Using the Default AWS KMS Key
|
|
|
|
Edit your cluster to add `encryptedVolume: true` to each etcd volume:
|
|
|
|
`kops edit cluster ${CLUSTER_NAME}`
|
|
|
|
```yaml
|
|
...
|
|
etcdClusters:
|
|
- etcdMembers:
|
|
- instanceGroup: master-us-east-1a
|
|
name: a
|
|
encryptedVolume: true
|
|
name: main
|
|
- etcdMembers:
|
|
- instanceGroup: master-us-east-1a
|
|
name: a
|
|
encryptedVolume: true
|
|
name: events
|
|
...
|
|
```
|
|
|
|
Update your cluster:
|
|
|
|
```
|
|
kops update cluster ${CLUSTER_NAME}
|
|
# Review changes before applying
|
|
kops update cluster ${CLUSTER_NAME} --yes
|
|
```
|
|
|
|
### Encrypting Etcd Volumes Using a Custom AWS KMS Key
|
|
|
|
Edit your cluster to add `encryptedVolume: true` to each etcd volume:
|
|
|
|
`kops edit cluster ${CLUSTER_NAME}`
|
|
|
|
```yaml
|
|
...
|
|
etcdClusters:
|
|
- etcdMembers:
|
|
- instanceGroup: master-us-east-1a
|
|
name: a
|
|
encryptedVolume: true
|
|
kmsKeyId: <full-arn-of-your-kms-key>
|
|
name: main
|
|
- etcdMembers:
|
|
- instanceGroup: master-us-east-1a
|
|
name: a
|
|
encryptedVolume: true
|
|
kmsKeyId: <full-arn-of-your-kms-key>
|
|
name: events
|
|
...
|
|
```
|
|
|
|
Update your cluster:
|
|
|
|
```
|
|
kops update cluster ${CLUSTER_NAME}
|
|
# Review changes before applying
|
|
kops update cluster ${CLUSTER_NAME} --yes
|
|
```
|