Update upgrade.md

This commit is contained in:
Anne Henmi 2018-11-05 16:42:25 -07:00 committed by GitHub
parent c8a2fb354a
commit d4b661692a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 39 additions and 41 deletions

View File

@ -15,32 +15,12 @@ architecture in Swarm to increase the performance and scale of the built-in load
> if not correctly followed, can have impact on the availability of applications running on the Swarm. These
> constraints impact any upgrades coming from any version before 18.09 to version 18.09 or greater.
## IP Address Consumption in 18.09+
In Swarm overlay networks, each task connected to a network consumes an IP address on that network. Swarm networks have a
finite amount of IPs based on the `--subnet` configured when the network is created. If no subnet is specified then Swarm
defaults to a `/24` network with 254 available IP addresses. When the IP space of a network is fully consumed, Swarm tasks
can no longer be scheduled on that network.
Docker Engine - Enterprise 18.09 and later, each Swarm node will consume an IP address from every Swarm network. This IP
address is consumed by the Swarm internal load balancer on the network. Swarm networks running on Engine versions 18.09
or greater must be configured to account for this increase in IP usage. Networks at or near consumption prior to engine version 18.09 may have a risk of reaching full utilization that will prevent tasks from being scheduled on to the network.
Maximum IP consumption per network at any given moment follows the following formula:
```
Max IP Consumed per Network = Number of Tasks on a Swarm Network + 1 IP for each node where these tasks are scheduled
```
To prevent this from happening, overlay networks should have enough capacity prior to an upgrade to 18.09, such that the network will have enough capacity after the upgrade. The below instructions offer tooling and steps to ensure capacity is measured before performing an upgrade.
>The above following only applies to containers running on Swarm overlay networks. This does not impact bridge, macvlan, host, or 3rd party docker networks.
## Cluster Upgrade Best Practices
Docker Engine - Enterprise upgrades in Swarm clusters should follow these guidelines in order to avoid exhaustion
application downtime.
* New workloads should not be actively scheduled in the cluster during upgrades. Large version mismatches between managers and workers can cause unintended consequences when new workloads are scheduled.
* Manager nodes should all be upgraded first before upgrading worker nodes. Upgrading manager nodes sequentially is recommended if live workloads in the cluster during the upgrade.
* Manager nodes should all be upgraded first before upgrading worker nodes. Upgrading manager nodes sequentially is recommended if live workloads are running in the cluster during the upgrade.
* Once manager nodes are upgraded worker nodes should be upgraded next and then the Swarm cluster upgrade is complete.
* If running UCP, the UCP upgrade should follow once all of the Swarm engines have been upgraded.
@ -49,8 +29,8 @@ To upgrade Docker Engine - Enterprise you need to individually upgrade each of t
following components:
1. Docker Engine - Enterprise.
2. Universal Control Plane (UCP).
3. Docker Trusted Registry (DTR).
2. [Universal Control Plane (UCP)](/ee/ucp/admin/install/upgrade/).
3. [Docker Trusted Registry (DTR)](/ee/dtr/admin/upgrade/).
While upgrading, some of these components become temporarily unavailable.
So you should schedule your upgrades to take place outside business peak hours
@ -86,6 +66,26 @@ Before you upgrade, make sure:
> the UCP controller.
{: .important}
## IP Address Consumption in 18.09+
In Swarm overlay networks, each task connected to a network consumes an IP address on that network. Swarm networks have a
finite amount of IPs based on the `--subnet` configured when the network is created. If no subnet is specified then Swarm
defaults to a `/24` network with 254 available IP addresses. When the IP space of a network is fully consumed, Swarm tasks
can no longer be scheduled on that network.
Docker Engine - Enterprise 18.09 and later, each Swarm node will consume an IP address from every Swarm network. This IP
address is consumed by the Swarm internal load balancer on the network. Swarm networks running on Engine versions 18.09
or greater must be configured to account for this increase in IP usage. Networks at or near consumption prior to engine version 18.09 may have a risk of reaching full utilization that will prevent tasks from being scheduled on to the network.
Maximum IP consumption per network at any given moment follows the following formula:
```
Max IP Consumed per Network = Number of Tasks on a Swarm Network + 1 IP for each node where these tasks are scheduled
```
To prevent this from happening, overlay networks should have enough capacity prior to an upgrade to 18.09, such that the network will have enough capacity after the upgrade. The below instructions offer tooling and steps to ensure capacity is measured before performing an upgrade.
>The above following only applies to containers running on Swarm overlay networks. This does not impact bridge, macvlan, host, or 3rd party docker networks.
## Upgrade Docker Engine - Enterprise
To avoid application downtime, you should be running Docker Engine - Enterprise in
@ -98,9 +98,7 @@ This ensures that your containers are started automatically after the upgrade.
To ensure that workloads running as Swarm services have no downtime, you need to:
1. Determine if the network is in danger of exhaustion, then
a. Triage and fix an upgrade that exhausted IP address space, or
b. Upgrade a service network live to add IP addresses
1. Determine if the network is in danger of exhaustion; and remediate to a new, larger network prior to upgrading.
2. Drain the node you want to upgrade so that services get scheduled in another node.
3. Upgrade the Docker Engine on that node.
4. Make the node available again.
@ -115,12 +113,9 @@ time can lead to a loss of quorum, and possible data loss.
Starting with a cluster with one or more services configured, determine whether some networks
may require update in order to function correctly after an Docker Engine - Enterprise 18.09 upgrade.
1. SSH into a manager node.
1. SSH into a manager node on a cluster where your applications are running.
2. Fetch and deploy a service that would exhaust IP addresses in one of its overlay networks, such as (https://raw.githubusercontent.com/ctelfer/moby-lb-upgrade-test/master/low_addrs/docker-compose.yml)
3. Run the following:
2. Run the following:
```
$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock docker/ip-util-check
@ -142,22 +137,22 @@ If the network is in danger of exhaustion, the output will show similar warnings
WARNING: network could exhaust IP addresses if the cluster scales to 9 or more nodes
```
3. Once you determine all networks are sized appropriately, start the upgrade on the Swarm managers.
#### Triage and fix an upgrade that exhausted IP address space
Starting with a cluster with services that exhaust their overlay address space in Docker Engine - Enterprise 18.09, adjust the deployment to fix this issue.
With an exhausted network, you can triage it using the following steps.
1. SSH into a manager node.
1. SSH into a manager node on a cluster where your applications are running.
2. Fetch and deploy a service that exhausts IP addresses in one of its overlay networks such as (https://raw.githubusercontent.com/ctelfer/moby-lb-upgrade-test/master/exhaust_addrs_3_nodes/docker-compose.yml).
3. Check the `docker service ls` output. It will diplay the service that is unable to completely fill all its replicas such as:
2. Check the `docker service ls` output. It will diplay the service that is unable to completely fill all its replicas such as:
```
ID NAME MODE REPLICAS IMAGE PORTS
wn3x4lu9cnln ex_service replicated 19/24 nginx:latest
```
4. Use `docker service ps ex_service` to find a failed replica such as:
3. Use `docker service ps ex_service` to find a failed replica such as:
```
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
@ -166,7 +161,7 @@ i64lee19ia6s \_ ex_service.11 nginx:latest tk1706-ubuntu-1
...
```
5. Examine the error using `docker inspect`. In this example, the `docker inspect i64lee19ia6s` output shows the error in the `Status.Err` field:
4. Examine the error using `docker inspect`. In this example, the `docker inspect i64lee19ia6s` output shows the error in the `Status.Err` field:
```
...
@ -185,9 +180,11 @@ i64lee19ia6s \_ ex_service.11 nginx:latest tk1706-ubuntu-1
...
```
6. Adjust the `- subnet:` field in `docker-compose.yml` to have a larger subnet such as `- subnet: 10.1.1.0/22`.
5. Adjust your network subnet in the deployment manifest, such that it has enough IPs required by the application.
7. Remove the original service and re-deploy with the new compose file. Confirm the adjusted service deployed successfully.
6. Redeploy the application.
7. Confirm the adjusted service deployed successfully.
## Manager Upgrades When Moving to Docker Engine - Enterprise 18.09 and later
@ -210,10 +207,11 @@ restriction only applies to managers and not worker nodes.
Start by draining the node so that services get scheduled in another node and
continue running without downtime.
For that, run this command on a manager node:
```
docker node update --availability drain <node>
$ docker node update --availability drain <node>
```
### Perform the upgrade