Include how to route away from broken etcd in etcd maintenance docs (#35882)

* Include how to route away from broken etcd in etcd maintenance docs

* Apply suggestions from code review

Apply suggestions and use 1. for all numbering (markdown will set the numbering automatically this way)

Co-authored-by: Han Kang <hankang@google.com>
Co-authored-by: Jihoon Seo <46767780+jihoon-seo@users.noreply.github.com>

* Update content/en/docs/tasks/administer-cluster/configure-upgrade-etcd.md

Co-authored-by: Jihoon Seo <46767780+jihoon-seo@users.noreply.github.com>

Co-authored-by: Han Kang <hankang@google.com>
Co-authored-by: Jihoon Seo <46767780+jihoon-seo@users.noreply.github.com>
This commit is contained in:
Joe Betz 2022-08-15 22:27:07 -04:00 committed by GitHub
parent 59cd910ec5
commit 263fc03201
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 27 additions and 9 deletions

View File

@ -2,6 +2,7 @@
reviewers:
- mml
- wojtek-t
- jpbetz
title: Operating etcd clusters for Kubernetes
content_type: task
---
@ -187,7 +188,21 @@ replace it with `member4=http://10.0.0.4`.
fd422379fda50e48, started, member3, http://10.0.0.3:2380, http://10.0.0.3:2379
```
2. Remove the failed member:
1. Do either of the following:
1. If each Kubernetes API server is configured to communicate with all etcd
members, remove the failed member from the `--etcd-servers` flag, then
restart each Kubernetes API server.
1. If each Kubernetes API server communicates with a single etcd member,
then stop the Kubernetes API server that communicates with the failed
etcd.
1. Stop the etcd server on the broken node. It is possible that other
clients besides the Kubernetes API server is causing traffic to etcd
and it is desirable to stop all traffic to prevent writes to the data
dir.
1. Remove the failed member:
```shell
etcdctl member remove 8211f1d0f64f3269
@ -199,7 +214,7 @@ replace it with `member4=http://10.0.0.4`.
Removed member 8211f1d0f64f3269 from cluster
```
3. Add the new member:
1. Add the new member:
```shell
etcdctl member add member4 --peer-urls=http://10.0.0.4:2380
@ -211,7 +226,7 @@ replace it with `member4=http://10.0.0.4`.
Member 2be1eb8f84b7f63e added to cluster ef37ad9dc622a7c4
```
4. Start the newly added member on a machine with the IP `10.0.0.4`:
1. Start the newly added member on a machine with the IP `10.0.0.4`:
```shell
export ETCD_NAME="member4"
@ -220,13 +235,16 @@ replace it with `member4=http://10.0.0.4`.
etcd [flags]
```
5. Do either of the following:
1. Do either of the following:
1. Update the `--etcd-servers` flag for the Kubernetes API servers to make
Kubernetes aware of the configuration changes, then restart the
Kubernetes API servers.
2. Update the load balancer configuration if a load balancer is used in the
deployment.
1. If each Kubernetes API server is configured to communicate with all etcd
members, add the newly added member to the `--etcd-servers` flag, then
restart each Kubernetes API server.
1. If each Kubernetes API server communicates with a single etcd member,
start the Kubernetes API server that was stopped in step 2. Then
configure Kubernetes API server clients to again route requests to the
Kubernetes API server that was stopped. This can often be done by
configuring a load balancer.
For more information on cluster reconfiguration, see
[etcd reconfiguration documentation](https://etcd.io/docs/current/op-guide/runtime-configuration/#remove-a-member).