Initial work on upgrade.md for issue 802.

This commit is contained in:
Anne Henmi 2018-10-30 18:10:57 -06:00
parent 9f0cc20d1d
commit cfee407c37
1 changed files with 108 additions and 3 deletions

View File

@ -59,9 +59,12 @@ This ensures that your containers are started automatically after the upgrade.
To ensure that workloads running as Swarm services have no downtime, you need to: To ensure that workloads running as Swarm services have no downtime, you need to:
1. Drain the node you want to upgrade so that services get scheduled in another node. 1. Determine if the network is in danger of exaustion
2. Upgrade the Docker Engine on that node. a. Triage and fix an upgrade that exhausted IP address space, or
3. Make the node available again. b. Upgrade a service network live to add IP addresses
3. Drain the node you want to upgrade so that services get scheduled in another node.
4. Upgrade the Docker Engine on that node.
5. Make the node available again.
If you do this sequentially for every node, you can upgrade with no If you do this sequentially for every node, you can upgrade with no
application downtime. application downtime.
@ -69,6 +72,108 @@ When upgrading manager nodes, make sure the upgrade of a node finishes before
you start upgrading the next node. Upgrading multiple manager nodes at the same you start upgrading the next node. Upgrading multiple manager nodes at the same
time can lead to a loss of quorum, and possible data loss. time can lead to a loss of quorum, and possible data loss.
### Determine if the network is in danger of exaustion
Starting with a cluser with one or more services configured, determine whether some networks
may require update in order to function correctly after an 18.09 upgrade.
1. SSH into a manager node.
2. Fetch and deploy a service that would exhaust IP addresses in one of its overlay networks.
3. Check the `docker service ls` output. It will diplay the service that is unable to completely fill all its replicas such as:
```
ID NAME MODE REPLICAS IMAGE PORTS
wn3x4lu9cnln ex_service replicated 19/24 nginx:latest
```
4. Use `docker service ps ex_service` to find a failed replica such as:
```
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
...
i64lee19ia6s \_ ex_service.11 nginx:latest tk1706-ubuntu-1 Shutdown Rejected 7 minutes ago "node is missing network attac…"
...
```
5. Examine the error using `docker inspect`. In this example, the `docker inspect i64lee19ia6s` output shows the error in the `Status.Err` field:
```
...
"Status": {
"Timestamp": "2018-08-24T21:03:37.885405884Z",
"State": "rejected",
"Message": "preparing",
**"Err": "node is missing network attachments, ip addresses may be exhausted",**
"ContainerStatus": {
"ContainerID": "",
"PID": 0,
"ExitCode": 0
},
"PortStatus": {}
},
...
```
#### Triage and fix an upgrade that exhausted IP address space
Starting with a cluser with services that exhaust their overlay address space in 18.09, adjust the deployment to fix this issue.
1. Adjust the `- subnet:` field in `docker-compose.yml` to have a larger subnet such as `- subnet: 10.1.1.0/22`.
2. Remove the original service and re-deploy with the new compose file. Confirm the adjusted service deployed successfully.
#### Upgrade a service network live to add IP addresses
Identify a subnet with few remaining IP addresses in a live service and upgrade the network live to add IP addresses.
1. SSH into a manager node.
2. Fetch and deploy a service that has very few IP addresses available in one of its overlay networks.
3. Run the following to determine if the subnet is near capactity:
```
$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock ctelfer/ip-util-check
```
4. Run the following to create a new subnet for the services on the overloaded subnet XXX. Substitute the overloaded network name for XXX.
```
$ docker network create -d overlay --subnet=10.252.0.0/8 XXX_bump_addrs
```
5. Run the following for each service to add the new network to the service.
```
$ docker service update --detach=false --network-add XXX_bump_addrs ex_serviceY
```
7. Run the following for each service attached to XXX to remove the overloaded network from the service.
```
$ docker service update --detach=false --network-rm XXX ex_serviceY
```
8. Run the following to remove the now unused network.
```
$ docker network rm XXX
```
9. Repeat the process of adding a new network with fresh address space but name it the same as the original overloaded subnet.
Then remove the "XXX_bump_addrs" subnet from each service. This leaves all services attached to a network named XXX, but with an
increased pool of addresses.
10. Run the following to confirm that subnet allocations are satisfactory.
```
$ docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock ctelfer/ip-util-check
```
### Drain the node ### Drain the node
Start by draining the node so that services get scheduled in another node and Start by draining the node so that services get scheduled in another node and