From 98401b6c39e55318a75d95beb4ceea04d8a72e25 Mon Sep 17 00:00:00 2001 From: Alex Mavrogiannis Date: Tue, 8 Nov 2016 16:46:16 -0800 Subject: [PATCH] update documentation on backup/restore --- .../backups-and-disaster-recovery.md | 54 +++++++++++++++---- 1 file changed, 45 insertions(+), 9 deletions(-) diff --git a/datacenter/ucp/2.0/high-availability/backups-and-disaster-recovery.md b/datacenter/ucp/2.0/high-availability/backups-and-disaster-recovery.md index 55ce512471..7ce9a7900d 100644 --- a/datacenter/ucp/2.0/high-availability/backups-and-disaster-recovery.md +++ b/datacenter/ucp/2.0/high-availability/backups-and-disaster-recovery.md @@ -25,7 +25,7 @@ UCP on that node, and streams it to stdout. To create a consistent backup, the backup command temporarily stops the UCP containers running on the node where the backup is being performed. User -containers are not affected by this. +containers and services are not affected by this. To have minimal impact on your business, you should: @@ -68,23 +68,59 @@ $ docker run --rm -i --name ucp \ docker/ucp restore < backup.tar ``` +The restore command may also be invoked in interactive mode: + +```bash +$ docker run --rm -i --name ucp \ + -v /var/run/docker.sock:/var/run/docker.sock \ + -v /path/to/backup.tar:/config/backup.tar \ + docker/ucp restore -i +``` ## Restore your cluster -Configuring UCP to have multiple controller nodes allows you tolerate a certain -amount of node failures. If multiple nodes fail at the same time, causing the -cluster to go down, you can use an existing backup to recover. +The restore command can be used to create a new UCP cluster from a backup file. +After the restore operation is complete, the following data will be copied from +the backup file: +* Users, Teams and Permissions. +* Cluster Configuration, such as the default Controller Port or the KV store +timeout. +* DDC Subscription License. +* Options on Scheduling, Content Trust, Authentication Methods and Reporting. + +The restore operation may be performed against any Docker Engine, regardless of +swarm membership, as long as the target Engine is not already managed by a UCP +installation. If the Docker Engine is already part of a swarm, that swarm and +all deployed containers and services will be managed by UCP after the restore +operation completes. As an example, if you have a cluster with three controller nodes, A, B, and C, and your most recent backup was of node A: -1. Stop controllers B and C with the `stop` command, -2. Restore controller A, -3. Uninstall UCP from controllers B and C, -4. Join nodes B and C as replica controllers to the cluster. +1. Uninstall UCP from the swarm using the `uninstall-ucp` operation. +2. Restore one of the swarm managers, such as node B, using the most recent + backup from node A. +3. Wait for all nodes of the swarm to become healthy UCP nodes. -You should now have your cluster up and running. +You should now have your UCP cluster up and running. +Additionally, in the event where half or more controller nodes are lost and +cannot be recovered to a healthy state, the system can only be restored through +the following disaster recovery procedure. It is important to note that this +proceedure is not guaranteed to succeed with no loss of either swarm services or +UCP configuration data: + +1. On one of the remaining manager nodes, perform `docker swarm init + --force-new-cluster`. This will instantiate a new single-manager swarm by + recovering as much state as possible from the existing manager. This is a + disruptive operation and any existing tasks will be either terminated or + suspended. +2. Obtain a backup of one of the remaining manager nodes if one is not already + available. +3. Perform a restore operation on the recovered swarm manager node. +4. For all other nodes of the cluster, perform a `docker swarm leave --force` + and then a `docker swarm join` operation with the cluster's new join-token. +5. Wait for all nodes of the swarm to become healthy UCP nodes. ## Where to go next