From 0fe7f7973af084fd0364f14ecf142703bc52af11 Mon Sep 17 00:00:00 2001 From: Joao Fernandes Date: Fri, 15 Apr 2016 12:08:56 -0700 Subject: [PATCH] Adds docs for backup and restore --- .../backups-and-disaster-recovery.md | 88 +++++++++++++++++++ high-availability/understand_ha.md | 28 ------ 2 files changed, 88 insertions(+), 28 deletions(-) create mode 100644 high-availability/backups-and-disaster-recovery.md diff --git a/high-availability/backups-and-disaster-recovery.md b/high-availability/backups-and-disaster-recovery.md new file mode 100644 index 0000000000..e8733090e1 --- /dev/null +++ b/high-availability/backups-and-disaster-recovery.md @@ -0,0 +1,88 @@ + + +# Backups and disaster recovery + +When you decide to start using Docker Universal Control Plane on a production +setting, you should [configure it for high availability](understand_ha.md). + +The next step is creating a backup policy and disaster recovery plan. + +## Backup policy + +Docker UCP nodes persist data using [named volumes](../architecture.md): + +* **Controller nodes** persist cluster configurations, certificates, and keys +used to issue certificates and user bundles. This data is replicated on every +controller node in the cluster. +* **Nodes** are stateless. They only store certificates for mutual TLS, that +can be regenerated. + +As part of your backup policy you should regularly create backups of the +controller nodes. Since the nodes used for running user containers don't +persist data, you can decide not to create any backups for them. + +To perform a backup of a UCP controller node, use the `docker/ucp backup` +command. This creates a tar archive with the contents of the volumes used by +UCP on that node, and streams it to stdout. + +To create a consistent backup, the backup command temporarily stops the UCP +containers running on the node where the backup is being performed. User +containers are not affected by this. + +To have minimal impact on your business, you should: + +* Schedule the backup to take place outside business hours. +* Configure UCP for high availability. This allows load-balancing user requests +across multiple UCP controller nodes. + +## Backup UCP data + +To learn about the options available on the `docker/ucp backup` command, you can +check the reference documentation, or run: + +```bash +$ docker run --rm docker/ucp backup --help +``` + +When creating a backup, the resulting tar archive contains sensitive information +like private keys. To ensure this information is kept private you should run +the backup command with the `--passphrase` option. This encrypts +the backup with a passphrase of your choice. + +The example below shows how to create a backup of a UCP controller node: + +```bash +# Create a backup, encrypt it, and store it on /tmp/backup.tar +$ docker run --rm -i --name ucp \ + -v /var/run/docker.sock:/var/run/docker.sock \ + docker/ucp --interactive --passphrase "secret" > /tmp/backup.tar + +Do you want proceed with the backup? (y/n): +$ y + +INFO[0000] Temporarily Stopping local UCP containers to ensure a consistent backup +INFO[0000] Beginning backup +INFO[0001] Backup completed successfully +INFO[0002] Resuming stopped UCP containers + +# Decrypt the backup and list its contents +$ gpg --decrypt /tmp/backup.tar | tar --list + +Enter passphrase: secret + +/ucp-client-root-ca/ +./ucp-client-root-ca/cert.pem +./ucp-client-root-ca/config.json +./ucp-client-root-ca/key.pem +./ucp-cluster-root-ca/ +# output snipped +``` diff --git a/high-availability/understand_ha.md b/high-availability/understand_ha.md index 6fe954aaa5..95b9c8b4a8 100644 --- a/high-availability/understand_ha.md +++ b/high-availability/understand_ha.md @@ -102,34 +102,6 @@ If an external load balancer is not used, system administrators should note the IP/hostname of the primary and all controller replicas. In this way, an administrator can access them when needed. -## Backup policy - -UCP configurations are stored using a key-value store that is replicated across -the controller and replica nodes. This makes the cluster tolerant to failures. - -The data of the key-value store and the certificates used for TLS are persisted -using volumes. [These volumes](../architecture.md#volumes) -are created when installing UCP on a node, and when joining nodes to a cluster. - -On UCP version 1.0, the CAs present in the controller node are not replicated -on other nodes: - -* Swarm CA: - * Used for admin cert bundle generation, - * Used for adding hosts to the cluster. -* UCP CA: - * Used for user bundle generation, - * Used to sign certs for new replica nodes. - -If the controller node fails, replica nodes will keep the system state and -still be able to handle user requests. However during a controller node failure -it's not possible to: - -* Download new certificate bundles for admin and non-admin users. Existing bundles will still work, -* Add more nodes to the cluster. Existing nodes will continue to operate. - -You should keep a backup of these volumes, so that you can restore the CAs used -in the controller node, in case of failure. ## Where to go next