From cfee407c37e79a3837029de94cc338d090607c5b Mon Sep 17 00:00:00 2001 From: Anne Henmi Date: Tue, 30 Oct 2018 18:10:57 -0600 Subject: [PATCH 01/13] Initial work on upgrade.md for issue 802. --- ee/upgrade.md | 111 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 108 insertions(+), 3 deletions(-) diff --git a/ee/upgrade.md b/ee/upgrade.md index 464683875f..84013ff91d 100644 --- a/ee/upgrade.md +++ b/ee/upgrade.md @@ -59,9 +59,12 @@ This ensures that your containers are started automatically after the upgrade. To ensure that workloads running as Swarm services have no downtime, you need to: -1. Drain the node you want to upgrade so that services get scheduled in another node. -2. Upgrade the Docker Engine on that node. -3. Make the node available again. +1. Determine if the network is in danger of exaustion + a. Triage and fix an upgrade that exhausted IP address space, or + b. Upgrade a service network live to add IP addresses +3. Drain the node you want to upgrade so that services get scheduled in another node. +4. Upgrade the Docker Engine on that node. +5. Make the node available again. If you do this sequentially for every node, you can upgrade with no application downtime. @@ -69,6 +72,108 @@ When upgrading manager nodes, make sure the upgrade of a node finishes before you start upgrading the next node. Upgrading multiple manager nodes at the same time can lead to a loss of quorum, and possible data loss. +### Determine if the network is in danger of exaustion + +Starting with a cluser with one or more services configured, determine whether some networks +may require update in order to function correctly after an 18.09 upgrade. + +1. SSH into a manager node. + +2. Fetch and deploy a service that would exhaust IP addresses in one of its overlay networks. + +3. Check the `docker service ls` output. It will diplay the service that is unable to completely fill all its replicas such as: + +``` +ID NAME MODE REPLICAS IMAGE PORTS +wn3x4lu9cnln ex_service replicated 19/24 nginx:latest +``` + +4. Use `docker service ps ex_service` to find a failed replica such as: + +``` +ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS + ... +i64lee19ia6s \_ ex_service.11 nginx:latest tk1706-ubuntu-1 Shutdown Rejected 7 minutes ago "node is missing network attac…" + ... +``` + +5. Examine the error using `docker inspect`. In this example, the `docker inspect i64lee19ia6s` output shows the error in the `Status.Err` field: + +``` +... + "Status": { + "Timestamp": "2018-08-24T21:03:37.885405884Z", + "State": "rejected", + "Message": "preparing", + **"Err": "node is missing network attachments, ip addresses may be exhausted",** + "ContainerStatus": { + "ContainerID": "", + "PID": 0, + "ExitCode": 0 + }, + "PortStatus": {} + }, + ... +``` + +#### Triage and fix an upgrade that exhausted IP address space + +Starting with a cluser with services that exhaust their overlay address space in 18.09, adjust the deployment to fix this issue. + +1. Adjust the `- subnet:` field in `docker-compose.yml` to have a larger subnet such as `- subnet: 10.1.1.0/22`. + +2. Remove the original service and re-deploy with the new compose file. Confirm the adjusted service deployed successfully. + +#### Upgrade a service network live to add IP addresses + +Identify a subnet with few remaining IP addresses in a live service and upgrade the network live to add IP addresses. + + +1. SSH into a manager node. + +2. Fetch and deploy a service that has very few IP addresses available in one of its overlay networks. + +3. Run the following to determine if the subnet is near capactity: + +``` +$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock ctelfer/ip-util-check +``` + +4. Run the following to create a new subnet for the services on the overloaded subnet XXX. Substitute the overloaded network name for XXX. + +``` +$ docker network create -d overlay --subnet=10.252.0.0/8 XXX_bump_addrs +``` + +5. Run the following for each service to add the new network to the service. + +``` +$ docker service update --detach=false --network-add XXX_bump_addrs ex_serviceY +``` + +7. Run the following for each service attached to XXX to remove the overloaded network from the service. + +``` +$ docker service update --detach=false --network-rm XXX ex_serviceY +``` + +8. Run the following to remove the now unused network. + +``` +$ docker network rm XXX +``` + +9. Repeat the process of adding a new network with fresh address space but name it the same as the original overloaded subnet. +Then remove the "XXX_bump_addrs" subnet from each service. This leaves all services attached to a network named XXX, but with an +increased pool of addresses. + + +10. Run the following to confirm that subnet allocations are satisfactory. + +``` +$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock ctelfer/ip-util-check +``` + ### Drain the node Start by draining the node so that services get scheduled in another node and From 59aad4df7b9156c7c8bada250eb788a580ee8009 Mon Sep 17 00:00:00 2001 From: Anne Henmi Date: Tue, 30 Oct 2018 18:34:18 -0600 Subject: [PATCH 02/13] Incorporated scenarios 1-4 completely. --- ee/upgrade.md | 102 +++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 93 insertions(+), 9 deletions(-) diff --git a/ee/upgrade.md b/ee/upgrade.md index 84013ff91d..00486468e6 100644 --- a/ee/upgrade.md +++ b/ee/upgrade.md @@ -79,7 +79,38 @@ may require update in order to function correctly after an 18.09 upgrade. 1. SSH into a manager node. -2. Fetch and deploy a service that would exhaust IP addresses in one of its overlay networks. +2. Fetch and deploy a service that would exhaust IP addresses in one of its overlay networks, such as (https://raw.githubusercontent.com/ctelfer/moby-lb-upgrade-test/master/low_addrs/docker-compose.yml) + + +3. Run the following: + +``` +$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock ctelfer/ip-util-check +``` + +If the network is in danger of exhaustion, the output will show similar warnings or errors: + +``` + Overlay IP Utilization Report + ---- + Network ex_net1/XXXXXXXXXXXX has an IP address capacity of 29 and uses 28 addresses + ERROR: network will be over capacity if upgrading Docker engine version 18.06 + or later. + ---- + Network ex_net2/YYYYYYYYYYYY has an IP address capacity of 29 and uses 24 addresses + WARNING: network could exhaust IP addresses if the cluster scales to 5 or more nodes + ---- + Network ex_net3/ZZZZZZZZZZZZ has an IP address capacity of 61 and uses 52 addresses + WARNING: network could exhaust IP addresses if the cluster scales to 9 or more nodes +``` + +#### Triage and fix an upgrade that exhausted IP address space + +Starting with a cluser with services that exhaust their overlay address space in 18.09, adjust the deployment to fix this issue. + +1. SSH into a manager node. + +2. Fetch and deploy a service that exhausts IP addresses in one of its overlay networks such as (https://raw.githubusercontent.com/ctelfer/moby-lb-upgrade-test/master/exhaust_addrs_3_nodes/docker-compose.yml). 3. Check the `docker service ls` output. It will diplay the service that is unable to completely fill all its replicas such as: @@ -116,13 +147,9 @@ i64lee19ia6s \_ ex_service.11 nginx:latest tk1706-ubuntu-1 ... ``` -#### Triage and fix an upgrade that exhausted IP address space +6. Adjust the `- subnet:` field in `docker-compose.yml` to have a larger subnet such as `- subnet: 10.1.1.0/22`. -Starting with a cluser with services that exhaust their overlay address space in 18.09, adjust the deployment to fix this issue. - -1. Adjust the `- subnet:` field in `docker-compose.yml` to have a larger subnet such as `- subnet: 10.1.1.0/22`. - -2. Remove the original service and re-deploy with the new compose file. Confirm the adjusted service deployed successfully. +7. Remove the original service and re-deploy with the new compose file. Confirm the adjusted service deployed successfully. #### Upgrade a service network live to add IP addresses @@ -174,6 +201,63 @@ increased pool of addresses. $ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock ctelfer/ip-util-check ``` +### Perform a hit-less upgrade + +To upgrade an entire Docker environment, use the following steps. + + +1. SSH into the manager node. + + +2.Promote two other nodes to manager: + +``` +$ docker node promote manager1 +$ docker node promote manager2 +``` + +3. Start a stack with clients connecting to services. For example: + +``` +$ curl "https://raw.githubusercontent.com/ctelfer/moby-lb-upgrade-test/master/upgrade_test_ct/docker-compose.yml" > docker-compose.yml +docker stack deploy --compose-file docker-compose.yml test +``` + +4. Upgrade all subsequent managers: + + a. SSH into each manager. + + b. Drain containers from the node: + + ``` + $ docker node update --availability drain $(docker node ls | grep managerY | awk '{print $1}') + ``` + + c. Verify containers have been moved off: + + ``` + $ docker container ls + ``` + + d. Upgrade docker to 18.09 on the system. + +5. After upgrading all the managers, reactivate all the nodes: + + a. SSH into each manager. + + b. Run the following to update all the nodes: + + ``` + $ for m in "manager0 manager1 manager2" ; do \ + docker node update --availability active $(docker node ls | grep $m | awk '{print $1}') \ + done + ``` + +6. Repeat the steps above for each worker but with two differences: + a. You muset drain and activeate the workers from a manager. + b. It is possible to reactivate each worker as soon as the upgrade for that worker is done. + + ### Drain the node Start by draining the node so that services get scheduled in another node and @@ -186,8 +270,8 @@ docker node update --availability drain ### Perform the upgrade -Upgrade Docker Engine on the node by following the instructions for your -specific distribution: +To upgrade a node individually by operating system, please follow the instructions +listed below: * [Windows Server](/install/windows/docker-ee.md#update-docker-ee) * [Ubuntu](/install/linux/docker-ee/ubuntu.md#upgrade-docker-ee) From 20fbf90d820380d14860d12d91818632871af163 Mon Sep 17 00:00:00 2001 From: Anne Henmi Date: Tue, 30 Oct 2018 18:44:56 -0600 Subject: [PATCH 03/13] Added troubleshooting (scenarios 5 and 6). --- ee/upgrade.md | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-) diff --git a/ee/upgrade.md b/ee/upgrade.md index 00486468e6..29c0a17d4f 100644 --- a/ee/upgrade.md +++ b/ee/upgrade.md @@ -59,9 +59,9 @@ This ensures that your containers are started automatically after the upgrade. To ensure that workloads running as Swarm services have no downtime, you need to: -1. Determine if the network is in danger of exaustion +1. Determine if the network is in danger of exaustion, then a. Triage and fix an upgrade that exhausted IP address space, or - b. Upgrade a service network live to add IP addresses + b. Upgrade a service network live to add IP addresses, or 3. Drain the node you want to upgrade so that services get scheduled in another node. 4. Upgrade the Docker Engine on that node. 5. Make the node available again. @@ -257,6 +257,41 @@ docker stack deploy --compose-file docker-compose.yml test a. You muset drain and activeate the workers from a manager. b. It is possible to reactivate each worker as soon as the upgrade for that worker is done. +### Troubleshooting the hit-less upgrade. + +If you re-activate a manager immediately instead of waiting for upgrades to the other managers, do the following: + +1. Run the following and observe rejected tasks: + +``` +$ docker service_test_ps +``` + +2. Run the following and look for `"Err": "node is missing network attachments, ip addresses may be exhausted",`. +XXX is the ID of one of the service tasks. + +``` +$ docker inspect XXX +``` + +3. Finish the upgrade and service will resume. + +If you forgot to drain the managers first, do the following: + +1. Run the following, and observe rejected tasks with an error `"cannot create a swarm scoped …"`: + +``` +$ docker service_test_ps +``` + +2. Run the following and look for `"Err": "cannot create a swarm scoped network when swarm is not active",` +XXX is the ID of one of the service tasks. + +``` +$ docker inspect XXX +``` + +3. Finish the upgrade and service will resume. ### Drain the node From 6b12d4615005b09748b1b40bbddc197a91121bab Mon Sep 17 00:00:00 2001 From: Anne Henmi <41210220+ahh-docker@users.noreply.github.com> Date: Thu, 1 Nov 2018 07:43:50 -0600 Subject: [PATCH 04/13] Update upgrade.md Resolved @ddeyo's comments and all but two of @mark-church's. --- ee/upgrade.md | 95 ++++++++++++++++++++++++--------------------------- 1 file changed, 44 insertions(+), 51 deletions(-) diff --git a/ee/upgrade.md b/ee/upgrade.md index 29c0a17d4f..86583e6ede 100644 --- a/ee/upgrade.md +++ b/ee/upgrade.md @@ -6,6 +6,42 @@ redirect_from: - /enterprise/upgrade/ --- +## Engine 18.09 Upgrades + +In Docker Engine 18.09, significant architectural improvements were made to the network architecture in Swarm to increase +the performance and scale of the built-in load balancing functionality. + +***NOTE:*** These changes introduce new constraints to the upgrade process that, if not correctly followed, can have +impact on the availability of applications running on the Swarm. These constraints impact any upgrades coming from any +version before 18.09 to version 18.09 or greater. + +## IP Address Consumption in 18.09+ +In Swarm overlay networks, each task connected to a network consumes an IP address on that network. Swarm networks have a +finite amount of IPs based on the `--subnet` configured when the network is created. If no subnet is specified then Swarm +defaults to a `/24` network with 254 available IP addresses. When the IP space of a network is fully consumed, Swarm tasks +can no longer be scheduled on that network. + +Docker Enterprise Engine 18.09 and later, each Swarm node will consume an IP address from every Swarm network. This IP +address is consumed by the Swarm internal load balancer on the network. Swarm networks running on Engine versions 18.09 +or greater must be configured to account for this increase in IP usage. Networks at or near consumption prior to engine version 18.09 may have a risk of reaching full utilization that will prevent tasks from being scheduled on to the network. +Maximum IP consumption per network at any given moment follows the following formula: + +``` +Max IP Consumed per Network = Number of Tasks on a Swarm Network + Number of Nodes +``` +To prevent this from happening, overlay networks should have enough capacity prior to an upgrade to 18.09, such that the network will have enough capacity after the upgrade. The below instructions offer tooling and steps to ensure capacity is measured before performing an upgrade. + +>The above following only applies to containers running on Swarm overlay networks. This does not impact bridge, macvlan, host, or 3rd party docker networks. + +## Cluster Upgrade Best Practices +Docker Engine upgrades in Swarm clusters should follow these guidelines in order to avoid aexaustionpplication downtime. + +* Workloads should not be actively scheduled in the cluster during upgrades. Large version mismatches between managers and workers can cause unintended consequences +* Manager nodes should all be upgraded first before upgrading worker nodes. Upgrading manager nodes sequentially is recommended if live workloads in the cluster during the upgrade. +* Once manager nodes are upgraded worker nodes should be upgraded next and then the Swarm cluster upgrade is complete. +* If running UCP, the UCP upgrade should follow once all of the Swarm engines have been upgraded. + + To upgrade Docker Enterprise Edition you need to individually upgrade each of the following components: @@ -59,7 +95,7 @@ This ensures that your containers are started automatically after the upgrade. To ensure that workloads running as Swarm services have no downtime, you need to: -1. Determine if the network is in danger of exaustion, then +1. Determine if the network is in danger of exhaustion, then a. Triage and fix an upgrade that exhausted IP address space, or b. Upgrade a service network live to add IP addresses, or 3. Drain the node you want to upgrade so that services get scheduled in another node. @@ -74,7 +110,7 @@ time can lead to a loss of quorum, and possible data loss. ### Determine if the network is in danger of exaustion -Starting with a cluser with one or more services configured, determine whether some networks +Starting with a cluster with one or more services configured, determine whether some networks may require update in order to function correctly after an 18.09 upgrade. 1. SSH into a manager node. @@ -85,7 +121,7 @@ may require update in order to function correctly after an 18.09 upgrade. 3. Run the following: ``` -$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock ctelfer/ip-util-check +$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock docker/ip-util-check ``` If the network is in danger of exhaustion, the output will show similar warnings or errors: @@ -106,7 +142,7 @@ If the network is in danger of exhaustion, the output will show similar warnings #### Triage and fix an upgrade that exhausted IP address space -Starting with a cluser with services that exhaust their overlay address space in 18.09, adjust the deployment to fix this issue. +Starting with a cluster with services that exhaust their overlay address space in 18.09, adjust the deployment to fix this issue. 1. SSH into a manager node. @@ -151,55 +187,12 @@ i64lee19ia6s \_ ex_service.11 nginx:latest tk1706-ubuntu-1 7. Remove the original service and re-deploy with the new compose file. Confirm the adjusted service deployed successfully. -#### Upgrade a service network live to add IP addresses +## Manager Upgrades When Moving to 18.09+ +The following is a constraint introduced by architectural changes to the Swarm overlay networking when upgrading to 18.09. It only applies to this one-time upgrade and to workloads that are using the Swarm overlay driver. Once upgraded to 18.09, this constraint does not impact future upgrades. -Identify a subnet with few remaining IP addresses in a live service and upgrade the network live to add IP addresses. +When upgrading to 18.09, manager nodes cannot reschedule new workloads on the managers until all managers have been upgraded to the 18.09 (or higher) version. During the upgrade of the managers, there is a possibility that any new workloads that are scheduled on the managers will fail to schedule until all of the managers have been upgraded. - -1. SSH into a manager node. - -2. Fetch and deploy a service that has very few IP addresses available in one of its overlay networks. - -3. Run the following to determine if the subnet is near capactity: - -``` -$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock ctelfer/ip-util-check -``` - -4. Run the following to create a new subnet for the services on the overloaded subnet XXX. Substitute the overloaded network name for XXX. - -``` -$ docker network create -d overlay --subnet=10.252.0.0/8 XXX_bump_addrs -``` - -5. Run the following for each service to add the new network to the service. - -``` -$ docker service update --detach=false --network-add XXX_bump_addrs ex_serviceY -``` - -7. Run the following for each service attached to XXX to remove the overloaded network from the service. - -``` -$ docker service update --detach=false --network-rm XXX ex_serviceY -``` - -8. Run the following to remove the now unused network. - -``` -$ docker network rm XXX -``` - -9. Repeat the process of adding a new network with fresh address space but name it the same as the original overloaded subnet. -Then remove the "XXX_bump_addrs" subnet from each service. This leaves all services attached to a network named XXX, but with an -increased pool of addresses. - - -10. Run the following to confirm that subnet allocations are satisfactory. - -``` -$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock ctelfer/ip-util-check -``` +In order to avoid any impactful application downtime, it is advised to reschedule any critical workloads on to Swarm worker nodes during the upgrade of managers. Worker nodes and their network functionality will continue to operate independently during any upgrades or outages on the managers. Note that this restriction only applies to managers and not worker nodes. ### Perform a hit-less upgrade From 5af5e2cf4c08ef59765f26dea44941a5f443e942 Mon Sep 17 00:00:00 2001 From: Anne Henmi <41210220+ahh-docker@users.noreply.github.com> Date: Thu, 1 Nov 2018 07:54:14 -0600 Subject: [PATCH 05/13] Update upgrade.md Fixed exaustion to exhaustion. --- ee/upgrade.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ee/upgrade.md b/ee/upgrade.md index 86583e6ede..f40e2ec76b 100644 --- a/ee/upgrade.md +++ b/ee/upgrade.md @@ -34,7 +34,7 @@ To prevent this from happening, overlay networks should have enough capacity pri >The above following only applies to containers running on Swarm overlay networks. This does not impact bridge, macvlan, host, or 3rd party docker networks. ## Cluster Upgrade Best Practices -Docker Engine upgrades in Swarm clusters should follow these guidelines in order to avoid aexaustionpplication downtime. +Docker Engine upgrades in Swarm clusters should follow these guidelines in order to avoid exhaustion application downtime. * Workloads should not be actively scheduled in the cluster during upgrades. Large version mismatches between managers and workers can cause unintended consequences * Manager nodes should all be upgraded first before upgrading worker nodes. Upgrading manager nodes sequentially is recommended if live workloads in the cluster during the upgrade. @@ -108,7 +108,7 @@ When upgrading manager nodes, make sure the upgrade of a node finishes before you start upgrading the next node. Upgrading multiple manager nodes at the same time can lead to a loss of quorum, and possible data loss. -### Determine if the network is in danger of exaustion +### Determine if the network is in danger of exhaustion Starting with a cluster with one or more services configured, determine whether some networks may require update in order to function correctly after an 18.09 upgrade. From b172968d4efa98bdbda04df2e93b1423ff27d916 Mon Sep 17 00:00:00 2001 From: Anne Henmi <41210220+ahh-docker@users.noreply.github.com> Date: Thu, 1 Nov 2018 14:14:23 -0600 Subject: [PATCH 06/13] Update upgrade.md --- ee/upgrade.md | 92 --------------------------------------------------- 1 file changed, 92 deletions(-) diff --git a/ee/upgrade.md b/ee/upgrade.md index f40e2ec76b..a49c58160f 100644 --- a/ee/upgrade.md +++ b/ee/upgrade.md @@ -194,98 +194,6 @@ When upgrading to 18.09, manager nodes cannot reschedule new workloads on the ma In order to avoid any impactful application downtime, it is advised to reschedule any critical workloads on to Swarm worker nodes during the upgrade of managers. Worker nodes and their network functionality will continue to operate independently during any upgrades or outages on the managers. Note that this restriction only applies to managers and not worker nodes. -### Perform a hit-less upgrade - -To upgrade an entire Docker environment, use the following steps. - - -1. SSH into the manager node. - - -2.Promote two other nodes to manager: - -``` -$ docker node promote manager1 -$ docker node promote manager2 -``` - -3. Start a stack with clients connecting to services. For example: - -``` -$ curl "https://raw.githubusercontent.com/ctelfer/moby-lb-upgrade-test/master/upgrade_test_ct/docker-compose.yml" > docker-compose.yml -docker stack deploy --compose-file docker-compose.yml test -``` - -4. Upgrade all subsequent managers: - - a. SSH into each manager. - - b. Drain containers from the node: - - ``` - $ docker node update --availability drain $(docker node ls | grep managerY | awk '{print $1}') - ``` - - c. Verify containers have been moved off: - - ``` - $ docker container ls - ``` - - d. Upgrade docker to 18.09 on the system. - -5. After upgrading all the managers, reactivate all the nodes: - - a. SSH into each manager. - - b. Run the following to update all the nodes: - - ``` - $ for m in "manager0 manager1 manager2" ; do \ - docker node update --availability active $(docker node ls | grep $m | awk '{print $1}') \ - done - ``` - -6. Repeat the steps above for each worker but with two differences: - a. You muset drain and activeate the workers from a manager. - b. It is possible to reactivate each worker as soon as the upgrade for that worker is done. - -### Troubleshooting the hit-less upgrade. - -If you re-activate a manager immediately instead of waiting for upgrades to the other managers, do the following: - -1. Run the following and observe rejected tasks: - -``` -$ docker service_test_ps -``` - -2. Run the following and look for `"Err": "node is missing network attachments, ip addresses may be exhausted",`. -XXX is the ID of one of the service tasks. - -``` -$ docker inspect XXX -``` - -3. Finish the upgrade and service will resume. - -If you forgot to drain the managers first, do the following: - -1. Run the following, and observe rejected tasks with an error `"cannot create a swarm scoped …"`: - -``` -$ docker service_test_ps -``` - -2. Run the following and look for `"Err": "cannot create a swarm scoped network when swarm is not active",` -XXX is the ID of one of the service tasks. - -``` -$ docker inspect XXX -``` - -3. Finish the upgrade and service will resume. - ### Drain the node Start by draining the node so that services get scheduled in another node and From 56e17971951f3e7f5801c026e94f8a8c9bea7638 Mon Sep 17 00:00:00 2001 From: Anne Henmi <41210220+ahh-docker@users.noreply.github.com> Date: Thu, 1 Nov 2018 14:27:56 -0600 Subject: [PATCH 07/13] Update upgrade.md incorporated the last of @mark-church's fixes. --- ee/upgrade.md | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/ee/upgrade.md b/ee/upgrade.md index a49c58160f..07098b8e9b 100644 --- a/ee/upgrade.md +++ b/ee/upgrade.md @@ -36,7 +36,7 @@ To prevent this from happening, overlay networks should have enough capacity pri ## Cluster Upgrade Best Practices Docker Engine upgrades in Swarm clusters should follow these guidelines in order to avoid exhaustion application downtime. -* Workloads should not be actively scheduled in the cluster during upgrades. Large version mismatches between managers and workers can cause unintended consequences +* New workloads should not be actively scheduled in the cluster during upgrades. Large version mismatches between managers and workers can cause unintended consequences when new workloads are scheduled. * Manager nodes should all be upgraded first before upgrading worker nodes. Upgrading manager nodes sequentially is recommended if live workloads in the cluster during the upgrade. * Once manager nodes are upgraded worker nodes should be upgraded next and then the Swarm cluster upgrade is complete. * If running UCP, the UCP upgrade should follow once all of the Swarm engines have been upgraded. @@ -216,14 +216,11 @@ listed below: * [Oracle Linux](/install/linux/docker-ee/oracle.md#upgrade-docker-ee) * [SLES](/install/linux/docker-ee/suse.md#upgrade-docker-ee) -### Make the node active +### Post-Upgrade Steps -Once you finish upgrading the node, make it available to run workloads. For -this, run: - -``` -docker node update --availability active -``` +After all manager and worker nodes have been upgrades, the Swarm cluster can be used again to schedule new +workloads. If workloads were previously scheduled off of the managers, they can be rescheduled again. +If any worker nodes were drained, they can be undrained again by setting `--availability active`. ## Upgrade UCP From 97adcf1dbba4a45a00533a82896f84fe478b2504 Mon Sep 17 00:00:00 2001 From: Anne Henmi <41210220+ahh-docker@users.noreply.github.com> Date: Fri, 2 Nov 2018 07:50:35 -0600 Subject: [PATCH 08/13] Update upgrade.md Added Docker Enterprise Engine per https://github.com/docker/docs-private/pull/836/files/56e17971951f3e7f5801c026e94f8a8c9bea7638#r230222063 --- ee/upgrade.md | 53 +++++++++++++++++++++++++++++++-------------------- 1 file changed, 32 insertions(+), 21 deletions(-) diff --git a/ee/upgrade.md b/ee/upgrade.md index 07098b8e9b..f08b301ac5 100644 --- a/ee/upgrade.md +++ b/ee/upgrade.md @@ -6,16 +6,17 @@ redirect_from: - /enterprise/upgrade/ --- -## Engine 18.09 Upgrades +## Docker Enterprise Engine 18.09 Upgrades -In Docker Engine 18.09, significant architectural improvements were made to the network architecture in Swarm to increase -the performance and scale of the built-in load balancing functionality. +In Docker Enterprise Engine 18.09, significant architectural improvements were made to the network +architecture in Swarm to increase the performance and scale of the built-in load balancing functionality. -***NOTE:*** These changes introduce new constraints to the upgrade process that, if not correctly followed, can have -impact on the availability of applications running on the Swarm. These constraints impact any upgrades coming from any -version before 18.09 to version 18.09 or greater. +***NOTE:*** These changes introduce new constraints to the Docker Enterprise Engine upgrade process that, +if not correctly followed, can have impact on the availability of applications running on the Swarm. These +constraints impact any upgrades coming from any version before 18.09 to version 18.09 or greater. ## IP Address Consumption in 18.09+ + In Swarm overlay networks, each task connected to a network consumes an IP address on that network. Swarm networks have a finite amount of IPs based on the `--subnet` configured when the network is created. If no subnet is specified then Swarm defaults to a `/24` network with 254 available IP addresses. When the IP space of a network is fully consumed, Swarm tasks @@ -34,7 +35,8 @@ To prevent this from happening, overlay networks should have enough capacity pri >The above following only applies to containers running on Swarm overlay networks. This does not impact bridge, macvlan, host, or 3rd party docker networks. ## Cluster Upgrade Best Practices -Docker Engine upgrades in Swarm clusters should follow these guidelines in order to avoid exhaustion application downtime. +Docker Enterprise Engine upgrades in Swarm clusters should follow these guidelines in order to avoid exhaustion +application downtime. * New workloads should not be actively scheduled in the cluster during upgrades. Large version mismatches between managers and workers can cause unintended consequences when new workloads are scheduled. * Manager nodes should all be upgraded first before upgrading worker nodes. Upgrading manager nodes sequentially is recommended if live workloads in the cluster during the upgrade. @@ -45,7 +47,7 @@ Docker Engine upgrades in Swarm clusters should follow these guidelines in order To upgrade Docker Enterprise Edition you need to individually upgrade each of the following components: -1. Docker Engine. +1. Docker Enterprise Engine. 2. Universal Control Plane (UCP). 3. Docker Trusted Registry (DTR). @@ -55,7 +57,7 @@ to make sure there's no impact to your business. ## Create a backup -Before upgrading Docker EE, you should make sure you [create a backup](backup.md). +Before upgrading Docker Enterprise Engine, you should make sure you [create a backup](backup.md). This makes it possible to recover if anything goes wrong during the upgrade. ## Check the compatibility matrix @@ -83,10 +85,10 @@ Before you upgrade, make sure: > the UCP controller. {: .important} -## Upgrade Docker Engine +## Upgrade Docker Enterprise Engine -To avoid application downtime, you should be running Docker in Swarm mode and -deploying your workloads as Docker services. That way you can +To avoid application downtime, you should be running Docker Enterprise Engine in +Swarm mode and deploying your workloads as Docker services. That way you can drain the nodes of any workloads before starting the upgrade. If you have workloads running as containers as opposed to swarm services, @@ -102,8 +104,7 @@ To ensure that workloads running as Swarm services have no downtime, you need to 4. Upgrade the Docker Engine on that node. 5. Make the node available again. -If you do this sequentially for every node, you can upgrade with no -application downtime. +If you do this sequentially for every node, you can upgrade with no application downtime. When upgrading manager nodes, make sure the upgrade of a node finishes before you start upgrading the next node. Upgrading multiple manager nodes at the same time can lead to a loss of quorum, and possible data loss. @@ -111,7 +112,7 @@ time can lead to a loss of quorum, and possible data loss. ### Determine if the network is in danger of exhaustion Starting with a cluster with one or more services configured, determine whether some networks -may require update in order to function correctly after an 18.09 upgrade. +may require update in order to function correctly after an Docker Enterprise Engine 18.09 upgrade. 1. SSH into a manager node. @@ -142,7 +143,7 @@ If the network is in danger of exhaustion, the output will show similar warnings #### Triage and fix an upgrade that exhausted IP address space -Starting with a cluster with services that exhaust their overlay address space in 18.09, adjust the deployment to fix this issue. +Starting with a cluster with services that exhaust their overlay address space in Docker Enterprise Engine 18.09, adjust the deployment to fix this issue. 1. SSH into a manager node. @@ -187,12 +188,22 @@ i64lee19ia6s \_ ex_service.11 nginx:latest tk1706-ubuntu-1 7. Remove the original service and re-deploy with the new compose file. Confirm the adjusted service deployed successfully. -## Manager Upgrades When Moving to 18.09+ -The following is a constraint introduced by architectural changes to the Swarm overlay networking when upgrading to 18.09. It only applies to this one-time upgrade and to workloads that are using the Swarm overlay driver. Once upgraded to 18.09, this constraint does not impact future upgrades. +## Manager Upgrades When Moving to Docker Enterprise Engine 18.09 and later -When upgrading to 18.09, manager nodes cannot reschedule new workloads on the managers until all managers have been upgraded to the 18.09 (or higher) version. During the upgrade of the managers, there is a possibility that any new workloads that are scheduled on the managers will fail to schedule until all of the managers have been upgraded. +The following is a constraint introduced by architectural changes to the Swarm overlay networking when +upgrading to Docker Enterprise Engine 18.09 or later. It only applies to this one-time upgrade and to w +orkloads that are using the Swarm overlay driver. Once upgraded to Docker Enterprise Engine 18.09, this +constraint does not impact future upgrades. -In order to avoid any impactful application downtime, it is advised to reschedule any critical workloads on to Swarm worker nodes during the upgrade of managers. Worker nodes and their network functionality will continue to operate independently during any upgrades or outages on the managers. Note that this restriction only applies to managers and not worker nodes. +When upgrading to Docker Enterprise Engine 18.09, manager nodes cannot reschedule new workloads on the +managers until all managers have been upgraded to the Docker Enterprise Engine 18.09 (or higher) version. +During the upgrade of the managers, there is a possibility that any new workloads that are scheduled on +the managers will fail to schedule until all of the managers have been upgraded. + +In order to avoid any impactful application downtime, it is advised to reschedule any critical workloads +on to Swarm worker nodes during the upgrade of managers. Worker nodes and their network functionality +will continue to operate independently during any upgrades or outages on the managers. Note that this +restriction only applies to managers and not worker nodes. ### Drain the node @@ -224,7 +235,7 @@ If any worker nodes were drained, they can be undrained again by setting `--avai ## Upgrade UCP -Once you've upgraded the Docker Engine running on all the nodes, upgrade UCP. +Once you've upgraded the Docker Enterprise Engine running on all the nodes, upgrade UCP. You can do this from the UCP web UI. ![UCP update notification banner](images/upgrade-1.png){: .with-border} From 5e3ea7ca22ead9b5b4ae8ca0d79789d55bfef707 Mon Sep 17 00:00:00 2001 From: Anne Henmi <41210220+ahh-docker@users.noreply.github.com> Date: Fri, 2 Nov 2018 07:55:05 -0600 Subject: [PATCH 09/13] Update upgrade.md Fixed https://github.com/docker/docs-private/pull/836/files#r230223172, https://github.com/docker/docs-private/pull/836/files#r230223631, https://github.com/docker/docs-private/pull/836/files#r230225394, and https://github.com/docker/docs-private/pull/836/files#r230236731 --- ee/upgrade.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/ee/upgrade.md b/ee/upgrade.md index f08b301ac5..e6d0115963 100644 --- a/ee/upgrade.md +++ b/ee/upgrade.md @@ -28,8 +28,9 @@ or greater must be configured to account for this increase in IP usage. Networks Maximum IP consumption per network at any given moment follows the following formula: ``` -Max IP Consumed per Network = Number of Tasks on a Swarm Network + Number of Nodes +Max IP Consumed per Network = Number of Tasks on a Swarm Network + 1 IP for each node where these tasks are scheduled ``` + To prevent this from happening, overlay networks should have enough capacity prior to an upgrade to 18.09, such that the network will have enough capacity after the upgrade. The below instructions offer tooling and steps to ensure capacity is measured before performing an upgrade. >The above following only applies to containers running on Swarm overlay networks. This does not impact bridge, macvlan, host, or 3rd party docker networks. @@ -99,10 +100,10 @@ To ensure that workloads running as Swarm services have no downtime, you need to 1. Determine if the network is in danger of exhaustion, then a. Triage and fix an upgrade that exhausted IP address space, or - b. Upgrade a service network live to add IP addresses, or -3. Drain the node you want to upgrade so that services get scheduled in another node. -4. Upgrade the Docker Engine on that node. -5. Make the node available again. + b. Upgrade a service network live to add IP addresses +2. Drain the node you want to upgrade so that services get scheduled in another node. +3. Upgrade the Docker Engine on that node. +4. Make the node available again. If you do this sequentially for every node, you can upgrade with no application downtime. When upgrading manager nodes, make sure the upgrade of a node finishes before @@ -131,7 +132,7 @@ If the network is in danger of exhaustion, the output will show similar warnings Overlay IP Utilization Report ---- Network ex_net1/XXXXXXXXXXXX has an IP address capacity of 29 and uses 28 addresses - ERROR: network will be over capacity if upgrading Docker engine version 18.06 + ERROR: network will be over capacity if upgrading Docker engine version 18.09 or later. ---- Network ex_net2/YYYYYYYYYYYY has an IP address capacity of 29 and uses 24 addresses From c8a2fb354a8dddc76e0f8bf1714709b767e75787 Mon Sep 17 00:00:00 2001 From: Anne Henmi <41210220+ahh-docker@users.noreply.github.com> Date: Mon, 5 Nov 2018 16:16:32 -0700 Subject: [PATCH 10/13] Update upgrade.md fixed Docker Engine - Enterprise and Note formatting. --- ee/upgrade.md | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/ee/upgrade.md b/ee/upgrade.md index e6d0115963..4cefd8fcde 100644 --- a/ee/upgrade.md +++ b/ee/upgrade.md @@ -6,14 +6,14 @@ redirect_from: - /enterprise/upgrade/ --- -## Docker Enterprise Engine 18.09 Upgrades +## Docker Engine - Enterprise 18.09 Upgrades -In Docker Enterprise Engine 18.09, significant architectural improvements were made to the network +In Docker Engine - Enterprise 18.09, significant architectural improvements were made to the network architecture in Swarm to increase the performance and scale of the built-in load balancing functionality. -***NOTE:*** These changes introduce new constraints to the Docker Enterprise Engine upgrade process that, -if not correctly followed, can have impact on the availability of applications running on the Swarm. These -constraints impact any upgrades coming from any version before 18.09 to version 18.09 or greater. +> ***NOTE:*** These changes introduce new constraints to the Docker Engine - Enterprise upgrade process that, +> if not correctly followed, can have impact on the availability of applications running on the Swarm. These +> constraints impact any upgrades coming from any version before 18.09 to version 18.09 or greater. ## IP Address Consumption in 18.09+ @@ -22,7 +22,7 @@ finite amount of IPs based on the `--subnet` configured when the network is crea defaults to a `/24` network with 254 available IP addresses. When the IP space of a network is fully consumed, Swarm tasks can no longer be scheduled on that network. -Docker Enterprise Engine 18.09 and later, each Swarm node will consume an IP address from every Swarm network. This IP +Docker Engine - Enterprise 18.09 and later, each Swarm node will consume an IP address from every Swarm network. This IP address is consumed by the Swarm internal load balancer on the network. Swarm networks running on Engine versions 18.09 or greater must be configured to account for this increase in IP usage. Networks at or near consumption prior to engine version 18.09 may have a risk of reaching full utilization that will prevent tasks from being scheduled on to the network. Maximum IP consumption per network at any given moment follows the following formula: @@ -36,7 +36,7 @@ To prevent this from happening, overlay networks should have enough capacity pri >The above following only applies to containers running on Swarm overlay networks. This does not impact bridge, macvlan, host, or 3rd party docker networks. ## Cluster Upgrade Best Practices -Docker Enterprise Engine upgrades in Swarm clusters should follow these guidelines in order to avoid exhaustion +Docker Engine - Enterprise upgrades in Swarm clusters should follow these guidelines in order to avoid exhaustion application downtime. * New workloads should not be actively scheduled in the cluster during upgrades. Large version mismatches between managers and workers can cause unintended consequences when new workloads are scheduled. @@ -45,10 +45,10 @@ application downtime. * If running UCP, the UCP upgrade should follow once all of the Swarm engines have been upgraded. -To upgrade Docker Enterprise Edition you need to individually upgrade each of the +To upgrade Docker Engine - Enterprise you need to individually upgrade each of the following components: -1. Docker Enterprise Engine. +1. Docker Engine - Enterprise. 2. Universal Control Plane (UCP). 3. Docker Trusted Registry (DTR). @@ -58,7 +58,7 @@ to make sure there's no impact to your business. ## Create a backup -Before upgrading Docker Enterprise Engine, you should make sure you [create a backup](backup.md). +Before upgrading Docker Engine - Enterprise, you should make sure you [create a backup](backup.md). This makes it possible to recover if anything goes wrong during the upgrade. ## Check the compatibility matrix @@ -86,9 +86,9 @@ Before you upgrade, make sure: > the UCP controller. {: .important} -## Upgrade Docker Enterprise Engine +## Upgrade Docker Engine - Enterprise -To avoid application downtime, you should be running Docker Enterprise Engine in +To avoid application downtime, you should be running Docker Engine - Enterprise in Swarm mode and deploying your workloads as Docker services. That way you can drain the nodes of any workloads before starting the upgrade. @@ -113,7 +113,7 @@ time can lead to a loss of quorum, and possible data loss. ### Determine if the network is in danger of exhaustion Starting with a cluster with one or more services configured, determine whether some networks -may require update in order to function correctly after an Docker Enterprise Engine 18.09 upgrade. +may require update in order to function correctly after an Docker Engine - Enterprise 18.09 upgrade. 1. SSH into a manager node. @@ -144,7 +144,7 @@ If the network is in danger of exhaustion, the output will show similar warnings #### Triage and fix an upgrade that exhausted IP address space -Starting with a cluster with services that exhaust their overlay address space in Docker Enterprise Engine 18.09, adjust the deployment to fix this issue. +Starting with a cluster with services that exhaust their overlay address space in Docker Engine - Enterprise 18.09, adjust the deployment to fix this issue. 1. SSH into a manager node. @@ -189,15 +189,15 @@ i64lee19ia6s \_ ex_service.11 nginx:latest tk1706-ubuntu-1 7. Remove the original service and re-deploy with the new compose file. Confirm the adjusted service deployed successfully. -## Manager Upgrades When Moving to Docker Enterprise Engine 18.09 and later +## Manager Upgrades When Moving to Docker Engine - Enterprise 18.09 and later The following is a constraint introduced by architectural changes to the Swarm overlay networking when -upgrading to Docker Enterprise Engine 18.09 or later. It only applies to this one-time upgrade and to w -orkloads that are using the Swarm overlay driver. Once upgraded to Docker Enterprise Engine 18.09, this +upgrading to Docker Engine - Enterprise 18.09 or later. It only applies to this one-time upgrade and to w +orkloads that are using the Swarm overlay driver. Once upgraded to Docker Engine - Enterprise 18.09, this constraint does not impact future upgrades. -When upgrading to Docker Enterprise Engine 18.09, manager nodes cannot reschedule new workloads on the -managers until all managers have been upgraded to the Docker Enterprise Engine 18.09 (or higher) version. +When upgrading to Docker Engine - Enterprise 18.09, manager nodes cannot reschedule new workloads on the +managers until all managers have been upgraded to the Docker Engine - Enterprise 18.09 (or higher) version. During the upgrade of the managers, there is a possibility that any new workloads that are scheduled on the managers will fail to schedule until all of the managers have been upgraded. @@ -236,7 +236,7 @@ If any worker nodes were drained, they can be undrained again by setting `--avai ## Upgrade UCP -Once you've upgraded the Docker Enterprise Engine running on all the nodes, upgrade UCP. +Once you've upgraded the Docker Engine - Enterprise running on all the nodes, upgrade UCP. You can do this from the UCP web UI. ![UCP update notification banner](images/upgrade-1.png){: .with-border} From d4b661692aff74cf93980f532cd0b838b3c82ddb Mon Sep 17 00:00:00 2001 From: Anne Henmi <41210220+ahh-docker@users.noreply.github.com> Date: Mon, 5 Nov 2018 16:42:25 -0700 Subject: [PATCH 11/13] Update upgrade.md --- ee/upgrade.md | 80 +++++++++++++++++++++++++-------------------------- 1 file changed, 39 insertions(+), 41 deletions(-) diff --git a/ee/upgrade.md b/ee/upgrade.md index 4cefd8fcde..648a74e0aa 100644 --- a/ee/upgrade.md +++ b/ee/upgrade.md @@ -15,32 +15,12 @@ architecture in Swarm to increase the performance and scale of the built-in load > if not correctly followed, can have impact on the availability of applications running on the Swarm. These > constraints impact any upgrades coming from any version before 18.09 to version 18.09 or greater. -## IP Address Consumption in 18.09+ - -In Swarm overlay networks, each task connected to a network consumes an IP address on that network. Swarm networks have a -finite amount of IPs based on the `--subnet` configured when the network is created. If no subnet is specified then Swarm -defaults to a `/24` network with 254 available IP addresses. When the IP space of a network is fully consumed, Swarm tasks -can no longer be scheduled on that network. - -Docker Engine - Enterprise 18.09 and later, each Swarm node will consume an IP address from every Swarm network. This IP -address is consumed by the Swarm internal load balancer on the network. Swarm networks running on Engine versions 18.09 -or greater must be configured to account for this increase in IP usage. Networks at or near consumption prior to engine version 18.09 may have a risk of reaching full utilization that will prevent tasks from being scheduled on to the network. -Maximum IP consumption per network at any given moment follows the following formula: - -``` -Max IP Consumed per Network = Number of Tasks on a Swarm Network + 1 IP for each node where these tasks are scheduled -``` - -To prevent this from happening, overlay networks should have enough capacity prior to an upgrade to 18.09, such that the network will have enough capacity after the upgrade. The below instructions offer tooling and steps to ensure capacity is measured before performing an upgrade. - ->The above following only applies to containers running on Swarm overlay networks. This does not impact bridge, macvlan, host, or 3rd party docker networks. - ## Cluster Upgrade Best Practices Docker Engine - Enterprise upgrades in Swarm clusters should follow these guidelines in order to avoid exhaustion application downtime. * New workloads should not be actively scheduled in the cluster during upgrades. Large version mismatches between managers and workers can cause unintended consequences when new workloads are scheduled. -* Manager nodes should all be upgraded first before upgrading worker nodes. Upgrading manager nodes sequentially is recommended if live workloads in the cluster during the upgrade. +* Manager nodes should all be upgraded first before upgrading worker nodes. Upgrading manager nodes sequentially is recommended if live workloads are running in the cluster during the upgrade. * Once manager nodes are upgraded worker nodes should be upgraded next and then the Swarm cluster upgrade is complete. * If running UCP, the UCP upgrade should follow once all of the Swarm engines have been upgraded. @@ -49,8 +29,8 @@ To upgrade Docker Engine - Enterprise you need to individually upgrade each of t following components: 1. Docker Engine - Enterprise. -2. Universal Control Plane (UCP). -3. Docker Trusted Registry (DTR). +2. [Universal Control Plane (UCP)](/ee/ucp/admin/install/upgrade/). +3. [Docker Trusted Registry (DTR)](/ee/dtr/admin/upgrade/). While upgrading, some of these components become temporarily unavailable. So you should schedule your upgrades to take place outside business peak hours @@ -86,6 +66,26 @@ Before you upgrade, make sure: > the UCP controller. {: .important} +## IP Address Consumption in 18.09+ + +In Swarm overlay networks, each task connected to a network consumes an IP address on that network. Swarm networks have a +finite amount of IPs based on the `--subnet` configured when the network is created. If no subnet is specified then Swarm +defaults to a `/24` network with 254 available IP addresses. When the IP space of a network is fully consumed, Swarm tasks +can no longer be scheduled on that network. + +Docker Engine - Enterprise 18.09 and later, each Swarm node will consume an IP address from every Swarm network. This IP +address is consumed by the Swarm internal load balancer on the network. Swarm networks running on Engine versions 18.09 +or greater must be configured to account for this increase in IP usage. Networks at or near consumption prior to engine version 18.09 may have a risk of reaching full utilization that will prevent tasks from being scheduled on to the network. +Maximum IP consumption per network at any given moment follows the following formula: + +``` +Max IP Consumed per Network = Number of Tasks on a Swarm Network + 1 IP for each node where these tasks are scheduled +``` + +To prevent this from happening, overlay networks should have enough capacity prior to an upgrade to 18.09, such that the network will have enough capacity after the upgrade. The below instructions offer tooling and steps to ensure capacity is measured before performing an upgrade. + +>The above following only applies to containers running on Swarm overlay networks. This does not impact bridge, macvlan, host, or 3rd party docker networks. + ## Upgrade Docker Engine - Enterprise To avoid application downtime, you should be running Docker Engine - Enterprise in @@ -98,9 +98,7 @@ This ensures that your containers are started automatically after the upgrade. To ensure that workloads running as Swarm services have no downtime, you need to: -1. Determine if the network is in danger of exhaustion, then - a. Triage and fix an upgrade that exhausted IP address space, or - b. Upgrade a service network live to add IP addresses +1. Determine if the network is in danger of exhaustion; and remediate to a new, larger network prior to upgrading. 2. Drain the node you want to upgrade so that services get scheduled in another node. 3. Upgrade the Docker Engine on that node. 4. Make the node available again. @@ -115,12 +113,9 @@ time can lead to a loss of quorum, and possible data loss. Starting with a cluster with one or more services configured, determine whether some networks may require update in order to function correctly after an Docker Engine - Enterprise 18.09 upgrade. -1. SSH into a manager node. +1. SSH into a manager node on a cluster where your applications are running. -2. Fetch and deploy a service that would exhaust IP addresses in one of its overlay networks, such as (https://raw.githubusercontent.com/ctelfer/moby-lb-upgrade-test/master/low_addrs/docker-compose.yml) - - -3. Run the following: +2. Run the following: ``` $ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock docker/ip-util-check @@ -142,22 +137,22 @@ If the network is in danger of exhaustion, the output will show similar warnings WARNING: network could exhaust IP addresses if the cluster scales to 9 or more nodes ``` +3. Once you determine all networks are sized appropriately, start the upgrade on the Swarm managers. + #### Triage and fix an upgrade that exhausted IP address space -Starting with a cluster with services that exhaust their overlay address space in Docker Engine - Enterprise 18.09, adjust the deployment to fix this issue. +With an exhausted network, you can triage it using the following steps. -1. SSH into a manager node. +1. SSH into a manager node on a cluster where your applications are running. -2. Fetch and deploy a service that exhausts IP addresses in one of its overlay networks such as (https://raw.githubusercontent.com/ctelfer/moby-lb-upgrade-test/master/exhaust_addrs_3_nodes/docker-compose.yml). - -3. Check the `docker service ls` output. It will diplay the service that is unable to completely fill all its replicas such as: +2. Check the `docker service ls` output. It will diplay the service that is unable to completely fill all its replicas such as: ``` ID NAME MODE REPLICAS IMAGE PORTS wn3x4lu9cnln ex_service replicated 19/24 nginx:latest ``` -4. Use `docker service ps ex_service` to find a failed replica such as: +3. Use `docker service ps ex_service` to find a failed replica such as: ``` ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS @@ -166,7 +161,7 @@ i64lee19ia6s \_ ex_service.11 nginx:latest tk1706-ubuntu-1 ... ``` -5. Examine the error using `docker inspect`. In this example, the `docker inspect i64lee19ia6s` output shows the error in the `Status.Err` field: +4. Examine the error using `docker inspect`. In this example, the `docker inspect i64lee19ia6s` output shows the error in the `Status.Err` field: ``` ... @@ -185,9 +180,11 @@ i64lee19ia6s \_ ex_service.11 nginx:latest tk1706-ubuntu-1 ... ``` -6. Adjust the `- subnet:` field in `docker-compose.yml` to have a larger subnet such as `- subnet: 10.1.1.0/22`. +5. Adjust your network subnet in the deployment manifest, such that it has enough IPs required by the application. -7. Remove the original service and re-deploy with the new compose file. Confirm the adjusted service deployed successfully. +6. Redeploy the application. + +7. Confirm the adjusted service deployed successfully. ## Manager Upgrades When Moving to Docker Engine - Enterprise 18.09 and later @@ -210,10 +207,11 @@ restriction only applies to managers and not worker nodes. Start by draining the node so that services get scheduled in another node and continue running without downtime. + For that, run this command on a manager node: ``` -docker node update --availability drain +$ docker node update --availability drain ``` ### Perform the upgrade From 3eda7110231491ac15288241c5bb0ca18926a6e2 Mon Sep 17 00:00:00 2001 From: Anne Henmi <41210220+ahh-docker@users.noreply.github.com> Date: Mon, 5 Nov 2018 16:45:08 -0700 Subject: [PATCH 12/13] Update upgrade.md --- ee/upgrade.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/ee/upgrade.md b/ee/upgrade.md index 648a74e0aa..992e6fb4c3 100644 --- a/ee/upgrade.md +++ b/ee/upgrade.md @@ -205,6 +205,9 @@ restriction only applies to managers and not worker nodes. ### Drain the node +If you are running live application on the cluster while upgrading, remove applications from nodes being upgrades +as to not create unplanned outages. + Start by draining the node so that services get scheduled in another node and continue running without downtime. From f7cf167de2a32daf0a86954bd017e74ce0962660 Mon Sep 17 00:00:00 2001 From: Anne Henmi <41210220+ahh-docker@users.noreply.github.com> Date: Tue, 6 Nov 2018 08:58:07 -0800 Subject: [PATCH 13/13] Update upgrade.md incorporated @JustinINevill's changes. --- ee/upgrade.md | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/ee/upgrade.md b/ee/upgrade.md index 992e6fb4c3..5bd4a06239 100644 --- a/ee/upgrade.md +++ b/ee/upgrade.md @@ -16,10 +16,11 @@ architecture in Swarm to increase the performance and scale of the built-in load > constraints impact any upgrades coming from any version before 18.09 to version 18.09 or greater. ## Cluster Upgrade Best Practices -Docker Engine - Enterprise upgrades in Swarm clusters should follow these guidelines in order to avoid exhaustion -application downtime. +Docker Engine - Enterprise upgrades in Swarm clusters should follow these guidelines in order to avoid IP address +space exhaustion and associated application downtime. -* New workloads should not be actively scheduled in the cluster during upgrades. Large version mismatches between managers and workers can cause unintended consequences when new workloads are scheduled. +* New workloads should not be actively scheduled in the cluster during upgrades. +* Large version mismatches between managers and workers can cause unintended consequences when new workloads are scheduled. * Manager nodes should all be upgraded first before upgrading worker nodes. Upgrading manager nodes sequentially is recommended if live workloads are running in the cluster during the upgrade. * Once manager nodes are upgraded worker nodes should be upgraded next and then the Swarm cluster upgrade is complete. * If running UCP, the UCP upgrade should follow once all of the Swarm engines have been upgraded. @@ -73,9 +74,12 @@ finite amount of IPs based on the `--subnet` configured when the network is crea defaults to a `/24` network with 254 available IP addresses. When the IP space of a network is fully consumed, Swarm tasks can no longer be scheduled on that network. -Docker Engine - Enterprise 18.09 and later, each Swarm node will consume an IP address from every Swarm network. This IP -address is consumed by the Swarm internal load balancer on the network. Swarm networks running on Engine versions 18.09 -or greater must be configured to account for this increase in IP usage. Networks at or near consumption prior to engine version 18.09 may have a risk of reaching full utilization that will prevent tasks from being scheduled on to the network. +Starting with Docker Engine - Enterprise 18.09 and later, each Swarm node will consume an IP address from every Swarm +network. This IP address is consumed by the Swarm internal load balancer on the network. Swarm networks running on Engine +versions 18.09 or greater must be configured to account for this increase in IP usage. Networks at or near consumption +prior to engine version 18.09 may have a risk of reaching full utilization that will prevent tasks from being scheduled +on to the network. + Maximum IP consumption per network at any given moment follows the following formula: ``` @@ -111,7 +115,8 @@ time can lead to a loss of quorum, and possible data loss. ### Determine if the network is in danger of exhaustion Starting with a cluster with one or more services configured, determine whether some networks -may require update in order to function correctly after an Docker Engine - Enterprise 18.09 upgrade. +may require updating the IP address space in order to function correctly after an Docker +Engine - Enterprise 18.09 upgrade. 1. SSH into a manager node on a cluster where your applications are running.