From 263fc03201352488ff5efee2bbbad3bcc6cd1e6b Mon Sep 17 00:00:00 2001 From: Joe Betz Date: Mon, 15 Aug 2022 22:27:07 -0400 Subject: [PATCH] Include how to route away from broken etcd in etcd maintenance docs (#35882) * Include how to route away from broken etcd in etcd maintenance docs * Apply suggestions from code review Apply suggestions and use 1. for all numbering (markdown will set the numbering automatically this way) Co-authored-by: Han Kang Co-authored-by: Jihoon Seo <46767780+jihoon-seo@users.noreply.github.com> * Update content/en/docs/tasks/administer-cluster/configure-upgrade-etcd.md Co-authored-by: Jihoon Seo <46767780+jihoon-seo@users.noreply.github.com> Co-authored-by: Han Kang Co-authored-by: Jihoon Seo <46767780+jihoon-seo@users.noreply.github.com> --- .../configure-upgrade-etcd.md | 36 ++++++++++++++----- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/content/en/docs/tasks/administer-cluster/configure-upgrade-etcd.md b/content/en/docs/tasks/administer-cluster/configure-upgrade-etcd.md index be77074dc1..3a5771e36e 100644 --- a/content/en/docs/tasks/administer-cluster/configure-upgrade-etcd.md +++ b/content/en/docs/tasks/administer-cluster/configure-upgrade-etcd.md @@ -2,6 +2,7 @@ reviewers: - mml - wojtek-t +- jpbetz title: Operating etcd clusters for Kubernetes content_type: task --- @@ -187,7 +188,21 @@ replace it with `member4=http://10.0.0.4`. fd422379fda50e48, started, member3, http://10.0.0.3:2380, http://10.0.0.3:2379 ``` -2. Remove the failed member: +1. Do either of the following: + + 1. If each Kubernetes API server is configured to communicate with all etcd + members, remove the failed member from the `--etcd-servers` flag, then + restart each Kubernetes API server. + 1. If each Kubernetes API server communicates with a single etcd member, + then stop the Kubernetes API server that communicates with the failed + etcd. + +1. Stop the etcd server on the broken node. It is possible that other + clients besides the Kubernetes API server is causing traffic to etcd + and it is desirable to stop all traffic to prevent writes to the data + dir. + +1. Remove the failed member: ```shell etcdctl member remove 8211f1d0f64f3269 @@ -199,7 +214,7 @@ replace it with `member4=http://10.0.0.4`. Removed member 8211f1d0f64f3269 from cluster ``` -3. Add the new member: +1. Add the new member: ```shell etcdctl member add member4 --peer-urls=http://10.0.0.4:2380 @@ -211,7 +226,7 @@ replace it with `member4=http://10.0.0.4`. Member 2be1eb8f84b7f63e added to cluster ef37ad9dc622a7c4 ``` -4. Start the newly added member on a machine with the IP `10.0.0.4`: +1. Start the newly added member on a machine with the IP `10.0.0.4`: ```shell export ETCD_NAME="member4" @@ -220,13 +235,16 @@ replace it with `member4=http://10.0.0.4`. etcd [flags] ``` -5. Do either of the following: +1. Do either of the following: - 1. Update the `--etcd-servers` flag for the Kubernetes API servers to make - Kubernetes aware of the configuration changes, then restart the - Kubernetes API servers. - 2. Update the load balancer configuration if a load balancer is used in the - deployment. + 1. If each Kubernetes API server is configured to communicate with all etcd + members, add the newly added member to the `--etcd-servers` flag, then + restart each Kubernetes API server. + 1. If each Kubernetes API server communicates with a single etcd member, + start the Kubernetes API server that was stopped in step 2. Then + configure Kubernetes API server clients to again route requests to the + Kubernetes API server that was stopped. This can often be done by + configuring a load balancer. For more information on cluster reconfiguration, see [etcd reconfiguration documentation](https://etcd.io/docs/current/op-guide/runtime-configuration/#remove-a-member).