Backup, restore, and disaster recovery refactor (#1094)

* Consolidation/cleanup

* Minor wording update

* Clean up files and fix broken links
This commit is contained in:
Maria Bermudez 2019-05-20 10:43:39 -07:00 committed by Maria Bermudez
parent 1591d9615f
commit 740512042a
11 changed files with 1256 additions and 48 deletions

View File

@ -1202,20 +1202,42 @@ manuals:
title: About Docker EE
- title: Try Docker EE on the cloud
path: https://trial.docker.com
- path: /ee/docker-ee-architecture/
title: Docker EE Architecture
- path: /ee/supported-platforms/
title: Supported platforms
nosync: true
- path: /ee/end-to-end-install/
title: Deploy Docker EE standard
- path: /ee/backup/
title: Backup Docker EE
title: Deploy Docker EE standard
- sectiontitle: Back up Docker Enterprise
section:
- path: /ee/admin/backup/
title: Overview
- path: /ee/admin/backup/back-up-swarm/
title: Back up Docker Swarm
- path: /ee/admin/backup/back-up-ucp/
title: Back up UCP
- path: /ee/admin/backup/back-up-dtr/
title: Back up DTR
- sectiontitle: Restore Docker Enterprise
section:
- path: /ee/admin/restore/
title: Overview
- path: /ee/admin/restore/restore-swarm/
title: Restore Docker Swarm
- path: /ee/admin/restore/restore-ucp/
title: Restore UCP
- path: /ee/admin/restore/restore-dtr/
title: Restore DTR
- sectiontitle: Disaster Recovery
section:
- path: /ee/admin/disaster-recovery/
title: Overview
- path: /ee/upgrade/
title: Upgrade Docker EE
- path: /ee/docker-ee-architecture/
title: Docker EE Architecture
title: Upgrade Docker Enterprise
- path: /ee/telemetry/
title: Manage usage data collection
- sectiontitle: Docker EE Engine
- sectiontitle: Engine
section:
- path: /ee/supported-platforms/
title: Install Docker EE Engine
@ -1348,8 +1370,6 @@ manuals:
title: Troubleshoot with logs
- path: /ee/ucp/admin/monitor-and-troubleshoot/troubleshoot-configurations/
title: Troubleshoot configurations
- path: /ee/ucp/admin/backups-and-disaster-recovery/
title: Backups and disaster recovery
- title: CLI reference
path: /reference/ucp/3.1/cli/
nosync: true

View File

@ -0,0 +1,190 @@
---
title: Back up DTR
description: Learn how to create a DTR backup
keywords: enterprise, backup, dtr, disaster recovery
redirect_from:
- /ee/dtr/admin/disaster-recovery/create-a-backup/
---
Backups do not cause downtime for DTR.
## DTR backup contents
All metadata and authZ information for a given DTR cluster is backed up.
| Data | Backed up | Description |
|:-----------------------------------|:----------|:---------------------------------------------------------------|
| Configurations | yes | DTR settings and cluster configurations |
| Repository metadata | yes | Metadata such as image architecture, repositories, images deployed, and size |
| Access control to repos and images | yes | Data about who has access to which images and repositories |
| Notary data | yes | Signatures and digests for images that are signed |
| Scan results | yes | Information about vulnerabilities in your images |
| Certificates and keys | yes | Certificates, public keys, and private keys that are used for mutual TLS communication |
| Image content | no | The images you push to DTR. This can be stored on the file system of the node running DTR, or other storage system, depending on the configuration. Needs to be backed up separately, depends on DTR configuration |
| Users, orgs, teams | no | Create a UCP backup to back up this data |
| Vulnerability database | no | Can be redownloaded after a restore |
This data is persisted on the host running DTR, using named volumes.
[Learn more about DTR named volumes](/ee/dtr/architecture/).
## Perform DTR backup
You should always create backups from the same DTR replica, to ensure a smoother
restore. If you have not previously performed a backup, the web interface displays a warning:
![](/ee/dtr/images/backup-warning.png)
To create a DTR backup, perform the following steps:
1. Run [docker/dtr backup](/reference/dtr/{{site.dtr_version}}/cli/backup/)
2. [Back up DTR image content]((#back-up-image-content)
3. [Back up DTR metadata](#back-up-dtr-metadata)
4. [Verify your backup](#verify-your-backup)
### Run the DTR backup command (CLI)
#### Find your replica ID
Since you need your DTR replica ID during a backup, the following covers a few ways for you to determine your replica ID:
##### UCP web interface
You can find the list of replicas by navigating to **Shared Resources > Stacks** or **Swarm > Volumes** (when using [swarm mode](/engine/swarm/)) on the UCP web interface.
##### UCP client bundle
From a terminal [using a UCP client bundle]((/ee/ucp/user-access/cli/), run:
{% raw %}
```bash
docker ps --format "{{.Names}}" | grep dtr
# The list of DTR containers with <node>/<component>-<replicaID>, e.g.
# node-1/dtr-api-a1640e1c15b6
```
{% endraw %}
##### SSH access
Another way to determine the replica ID is to SSH into a DTR node and run the following:
{% raw %}
```bash
REPLICA_ID=$(docker inspect -f '{{.Name}}' $(docker ps -q -f name=dtr-rethink) | cut -f 3 -d '-')
&& echo $REPLICA_ID
```
{% endraw %}
#### Back up image content
Since you can configure the storage backend that DTR uses to store images,
the way you back up images depends on the storage backend you're using.
If you've configured DTR to store images on the local file system or NFS mount,
you can backup the images by using SSH to log in to a DTR node,
and creating a tar archive of the [dtr-registry volume](/ee/dtr/architecture/):
{% raw %}
```none
sudo tar -cf {{ image_backup_file }} \
-C /var/lib/docker/volumes/ dtr-registry-<replica-id>
```
{% endraw %}
If you're using a different storage backend, follow the best practices
recommended for that system.
#### Back up DTR metadata
To create a DTR backup, load your UCP client bundle, and run the following
command, replacing the placeholders with real values:
```bash
read -sp 'ucp password: ' UCP_PASSWORD;
```
This prompts you for the UCP password. Next, run the following to back up your DTR metadata and save the result into a tar archive. You can learn more about the supported flags in
the [reference documentation](/reference/dtr/2.6/cli/backup/).
```bash
docker run --log-driver none -i --rm \
--env UCP_PASSWORD=$UCP_PASSWORD \
{{ page.dtr_org }}/{{ page.dtr_repo }}:{{ page.dtr_version }} backup \
--ucp-url <ucp-url> \
--ucp-insecure-tls \
--ucp-username <ucp-username> \
--existing-replica-id <replica-id> > {{ metadata_backup_file }}
```
Where:
* `<ucp-url>` is the url you use to access UCP.
* `<ucp-username>` is the username of a UCP administrator.
* `<replica-id>` is the id of the DTR replica to backup.
By default the backup command doesn't stop the DTR replica being backed up.
This means you can take frequent backups without affecting your users.
You can use the `--offline-backup` option to stop the DTR replica while taking
the backup. If you do this, remove the replica from the load balancing pool.
Also, the backup contains sensitive information
like private keys, so you can encrypt the backup by running:
```none
gpg --symmetric {{ metadata_backup_file }}
```
This prompts you for a password to encrypt the backup, copies the backup file
and encrypts it.
## Verify your backup
To validate that the backup was correctly performed, you can print the contents
of the tar file created. The backup of the images should look like:
```none
tar -tf {{ metadata_backup_file }}
dtr-backup-v{{ page.dtr_version }}/
dtr-backup-v{{ page.dtr_version }}/rethink/
dtr-backup-v{{ page.dtr_version }}/rethink/layers/
```
And the backup of the DTR metadata should look like:
```none
tar -tf {{ metadata_backup_file }}
# The archive should look like this
dtr-backup-v{{ page.dtr_version }}/
dtr-backup-v{{ page.dtr_version }}/rethink/
dtr-backup-v{{ page.dtr_version }}/rethink/properties/
dtr-backup-v{{ page.dtr_version }}/rethink/properties/0
```
If you've encrypted the metadata backup, you can use:
```none
gpg -d {{ metadata_backup_file }} | tar -t
```
You can also create a backup of a UCP cluster and restore it into a new
cluster. Then restore DTR on that new cluster to confirm that everything is
working as expected.
### Where to go next
- [Configure your storage backend](/ee/dtr/admin/configure/external-storage/)
- [Switch your storage backend](/ee/dtr/admin/configure/external-storage/storage-backend-migration/)
- [Use NFS](/ee/dtr/admin/configure/external-storage/nfs/)
- [Use S3](/ee/dtr/admin/configure/external-storage/s3/)
- CLI reference pages
- [docker/dtr install](/reference/dtr/2.6/cli/install/)
- [docker/dtr reconfigure](/reference/dtr/2.6/cli/reconfigure/)
- [docker/dtr restore](/reference/dtr/2.6/cli/restore/)

View File

@ -0,0 +1,84 @@
---
title: Back up Docker Swarm
description: Learn how to create a backup of Docker Swarm
keywords: enterprise, backup, swarm
---
Docker manager nodes store the swarm state and manager logs in the `/var/lib/docker/swarm/` directory. Swarm raft logs contain crucial information for re-creating Swarm specific resources, including services, secrets, configurations and node cryptographic identity. In 1.13 and higher, this data includes the keys used to encrypt the raft logs. Without these keys, you cannot restore the swarm.
You must perform a manual backup on each manager node, because logs contain node IP address information and are not transferable to other nodes. If you do not backup the raft logs, you cannot verify workloads or Swarm resource provisioning after restoring the cluster.
> You can avoid performing Swarm backup by storing stacks, services definitions, secrets, and networks definitions in a *Source Code Management* or *Config Management* tool.
## Swarm backup contents
| Data | Description | Backed up |
| :------------------|:-------------------------------------------------------------------------------------|:----------|
| Raft keys | Used to encrypt communication among Swarm nodes and to encrypt and decrypt Raft logs | yes
| Membership | List of the nodes in the cluster | yes
| Services | Stacks and services stored in Swarm-mode | yes
| Networks (overlay) | The overlay networks created on the cluster | yes
| Configs | The configs created in the cluster | yes
| Secrets | Secrets saved in the cluster | yes
| Swarm unlock key | **Must be saved on a password manager !** | no
## Procedure
1. Retrieve your Swarm unlock key if `auto-lock` is enabled to be able
to restore the swarm from backup. Retrieve the unlock key if necessary and
store it in a safe location. If you are unsure, read
[Lock your swarm to protect its encryption key](/engine/swarm/swarm_manager_locking.md).
2. Because you must stop the engine of the manager node before performing the backup, having three manager
nodes is recommended for high availability [HA]). For a cluster to be operational, a majority of managers
must be online. If less than 3 managers exists, the cluster is unavailable during the backup.
> **Note**: During the time that a manager is shut down, your swarm is more vulnerable to
> losing the quorum if further nodes are lost. A loss of quorum means that the swarm is unavailabile
> until quorum is recovered. Quorum is only recovered when more than 50% of the nodes are again available.
> If you regularly take down managers to do backups, consider running a 5-manager swarm, so that you
> can lose an additional manager while the backup is running without disrupting services.
3. Select a manager node. Try not to select the leader in order to avoid a new election inside the cluster:
```
docker node ls -f "role=manager" | tail -n+2 | grep -vi leader
```
> Optional: Store the Docker version in a variable for easy addition to your backup name.
{% raw %}
```
ENGINE=$(docker version -f '{{.Server.Version}}')
```
{% endraw %}
4. Stop the Docker Engine on the manager before backing up the data, so that no data is changed during the backup:
```
systemctl stop docker
```
5. Back up the entire `/var/lib/docker/swarm` folder:
```
tar cvzf "/tmp/swarm-${ENGINE}-$(hostname -s)-$(date +%s%z).tgz" /var/lib/docker/swarm/
```
Note: _You can decode the Unix epoch in the filename by typing `date -d @timestamp`._ For example:
```
date -d @1531166143
Mon Jul 9 19:55:43 UTC 2018
```
6. Restart the manager Docker Engine:
```
systemctl start docker
```
7. Except for step 1, repeat the previous steps for each node.
### Where to go next
- [Back up UCP](back-up-ucp)

View File

@ -0,0 +1,244 @@
---
title: Back up UCP
description: Learn how to create a backup of UCP
keywords: enterprise, backup, ucp
redirect_from:
- /ee/ucp/admin/backups-and-disaster-recovery/
---
UCP backups no longer require pausing the reconciler and deleting UCP containers, and backing up a UCP manager does not disrupt the managers activities.
Because UCP stores the same data on all manager nodes, you only need to back up a single UCP manager node.
User resources, such as services, containers, and stacks are not affected by this
operation and continue operating as expected.
## Limitations
- Backups should not be utilized for restoring clusters on a cluster with a newer version of Docker Enterprise. For example, if backups occur on version N, then a restore on version N+1 is not supported.
- More than one backup at the same time is not supported. If a backup is attempted while another backup is in progress, or if two backups are scheduled at the same time, a message is displayed to indicate that the second backup failed because another backup is running.
- For crashed clusters, backup capability is not guaranteed. Perform regular backups to avoid this situation.
- UCP backup does not include swarm workloads.
## UCP backup contents
Backup contents are stored in a `.tar` file. Backups contain UCP configuration metadata to re-create configurations such as **Administration Settings** values such as LDAP and SAML, and RBAC configurations (Collections, Grants, Roles, User, and more):
| Data | Description | Backed up |
| :---------------------|:-----------------------------------------------------------------------------------|:----------|
| Configurations | UCP configurations, including Docker EE license. Swarm, and client CAs | yes
| Access control | Permissions for teams to swarm resources, including collections, grants, and roles | yes
| Certificates and keys | Certificates and public and private keys used for authentication and mutual TLS communication | yes
| Metrics data | Monitoring data gathered by UCP | yes
| Organizations | Users, teams, and organizations | yes
| Volumes | All [UCP named volumes](/ee/ucp/ucp-architecture/#volumes-used-by-ucp/), including all UCP component certicates and data. [Learn more about UCP named volumes](/ee/ucp/ucp-architecture/). | yes
| Overlay Networks | Swarm-mode overlay network definitions, including port information | no
| Configs, Secrets | Create a Swarm backup to backup these data | no
| Services | Stacks and services are stored in Swarm-mode or SCM/Config Management | no
**Note**: Because kube stores the state of resources on `etcd`, a backup of `etcd` is sufficient for stateless backups and is described [here](https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#backing-up-an-etcd-cluster).
## Data not included in the backup
* `ucp-metrics-data`: holds the metrics server's data.
* `ucp-node-certs` : holds certs used to lock down UCP system components
* Routing mesh settings. Interlock L7 ingress configuration information is not captured in UCP backups. A manual backup and restore process is possible and should be performed.
## Kubernetes settings, data, and state
UCP backups include all kubernetes declarative objects (pods, deployments, replicasets, configs...), including secrets.
> **Note**: Kube volumes and kube node labels are not be backed up.
Upon restore, kubernetes declarative objects are re-created. Containers are re-created and IPs are resolved.
For more information, see [Backing up an etcd cluster](https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#backing-up-an-etcd-cluster).
## Specify a backup file
To avoid directly managing backup files, you can specify a file name and host directory on a secure and configured storage backend, such as NFS or another networked file system. The file system location is the backup folder on the manager node file system. This location must be writable by the `nobody` user, which is specified by changing the folder ownership to `nobody`. This operation requires administrator permissions to the manager node, and must only be run once for a given file system location.
```
sudo chown nobody:nogroup /path/to/folder
```
> **Important**:
- Specify a different name for each backup file. Otherwise, the existing backup file with the same name is overwritten.
- Specify a location that is mounted on an fault-tolerant file system (such as NFS) rather than the node's local disk. Otherwise, it is important to regularly move backups from the manager node's local disk to ensure adequate space for ongoing backups.
## UCP backup steps
There are several options for creating a UCP backup:
- [CLI](#create-a-ucp-backup-using-the-cli)
- [UI](#create-a-ucp-backup-using-the-ui)
- [API](#create-list-and-retrieve-ucp-backups-using-the-api)
The backup process runs on one manager node.
### Create a UCP backup using the CLI
The following example shows how to create a UCP manager node backup, encrypt it by using a passphrase, decrypt it, verify its contents, and store it on `/securelocation/backup.tar`:
1. Run the `{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} backup` command on a single UCP manager and include the `--file` and `--include-logs`options. This creates a tar archive with the contents of all [volumes used by UCP](/ee/ucp-architecture/)
and streams it to `stdout`. Replace `version` with the version you are currently running.
```bash
docker container run \
--log-driver none --rm \
--interactive \
--name ucp \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /tmp:/backup \
$ORG/ucp:$TAG backup \
--file /securelocation/backup.tar \
--passphrase "secret12chars" \
--include-logs false
```
> **Note**: If you are running with Security-Enhanced Linux (SELinux) enabled, which is typical for RHEL hosts, you must include `--security-opt label=disable` in the `docker` command (replace `version` with the version you are currently running):
```bash
docker container run \
--rm \
--log-driver=none \
--security-opt label=disable \
--name ucp \
-v /var/run/docker.sock:/var/run/docker.sock \
docker/ucp:$version backup \
--passphrase "secret12chars" > /securelocation/backup.tar
```
> To determine whether SELinux is enabled in the engine, view the hosts `/etc/docker/daemon.json` file, and search for the string `"selinux-enabled":"true"`.
#### View log and progress information
To view backup progress and error reporting, view the contents of the stderr streams of the running backup container during the backup. Progress is updated for each backup step, for example, after validation, after volumes are backed up, after `etcd` is backed up, and after `rethinkDB`. Progress is not preserved after the backup has completed.
#### Verify a UCP backup
In a valid backup file, more than 100 files are displayed in the list and the `./ucp-node-certs/key.pem` file is present. Ensure the backup is a valid tar file by listing its contents, as shown in the following exampele:
```
$ gpg --decrypt /securelocation/backup.tar | tar --list
```
If decryption is not needed, you can list the contents by removing the `--decrypt flag`, as shown in the following example:
```
$ tar --list -f /securelocation/backup.tar
```
### Create a UCP backup using the UI
1. In the UCP UI, navigate to **Admin Settings**.
2. Select **Backup Admin**.
3. Select **Backup Now** to trigger an immediate backup.
The UI also provides the following options:
- Display the status of a running backup
- Display backup history
- View backup contents
### Create, list, and retrieve UCP backups using the API
The UCP API provides three endpoints for managing UCP backups. You must be a UCP administrator to access these API endpoints.
#### Create a UCP backup using the API
Create a backup with the `POST: /api/ucp/backup` endpoint. This is a json endpoint with the following arguments:
| field name | JSON data type* | description |
|:----------: |:-------: |:----------------------------------------: |
| passphrase | string | Encryption passphrase |
| noPassphrase | bool | Set to `true` if not using a passphrase |
| fileName | string | Backup file name |
| includeLogs | bool | Specifies whether to include a log file |
| hostPath | string | [File system location](#specify-a-backup-file) |
The request returns one of the following HTTP status codes, and, if successful, a backup ID.
- 200: Success
- 500: Internal server error
- 400: Malformed request (payload fails validation)
##### Example
```
$ curl -sk -H 'Authorization: Bearer $AUTHTOKEN' https://$UCP_HOSTNAME/api/ucp/backup \
-X POST \
-H "Content-Type: application/json" \
--data '{"encrypted": true, "includeLogs": true, "fileName": "backup1.tar", "logFileName": "backup1.log", "hostPath": "/secure-location"}'
200 OK
```
where:
- `$AUTHTOKEN` is your authentication bearer token if using using auth token identification.
- `$UCP_HOSTNAME ` is your UCP hostname.
#### List all backups using the API
List existing backups with the `GET: /api/ucp/backups` endpoint. This request does not expect a payload and returns a list of backups, each as a JSON object following the schema found in the [Backup schema](#backup-schema) section.
The request returns one of the following HTTP status codes and, if successful, a list of existing backups:
- 200: Success
- 500: Internal server error
##### Example
```
curl -sk -H 'Authorization: Bearer $AUTHTOKEN' https://$UCP_HOSTNAME/api/ucp/backups
[
{
"id": "0d0525dd-948a-41b4-9f25-c6b4cd6d9fe4",
"encrypted": true,
"fileName": "backup2.tar",
"logFileName": "backup2.log",
"backupPath": "/secure-location",
"backupState": "SUCCESS",
"nodeLocation": "ucp-node-ubuntu-0",
"shortError": "",
"created_at": "2019-04-10T21:55:53.775Z",
"completed_at": "2019-04-10T21:56:01.184Z"
},
{
"id": "2cf210df-d641-44ca-bc21-bda757c08d18",
"encrypted": true,
"fileName": "backup1.tar",
"logFileName": "backup1.log",
"backupPath": "/secure-location",
"backupState": "IN_PROGRESS",
"nodeLocation": "ucp-node-ubuntu-0",
"shortError": "",
"created_at": "2019-04-10T01:23:59.404Z",
"completed_at": "0001-01-01T00:00:00Z"
}
]
```
#### Retrieve backup details using the API
Retrieve details for a specific backup using the `GET: /api/ucp/backup/{backup_id}` endpoint, where `{backup_id}` is the ID of an existing backup. This request returns the backup, if it exists, for the specified ID, as a JSON object following the schema found in the [Backup schema](#backup-schema) section.
The request returns one of the following HTTP status codes, and if successful, the backup for the specified ID:
- 200: Success
- 404: Backup not found for the given `{backup_id}`
- 500: Internal server error
#### Backup schema
The following table describes the backup schema returned by the `GET` and `LIST` APIs:
| field name | JSON data type* | description |
|:------------: |:---------------: |:-------------------------------------------------------------------: |
| id | string | Unique ID |
| encrypted | boolean | Set to `true` if encrypted with a passphrase |
| fileName | string | Backup file name if backing up to a file, empty otherwise |
| logFileName | string | Backup log file name if saving backup logs, empty otherwise |
| backupPath | string | Host path where backup resides |
| backupState | string | Current state of the backup (`IN_PROGRESS`, `SUCCESS`, `FAILED`) |
| nodeLocation | string | Node on which the backup was taken |
| shortError | string | Short error. Empty unless `backupState` is set to `FAILED` |
| created_at | string | Time of backup creation |
| completed_at | string | Time of backup completion |
> *: JSON data type as defined per [JSON RFC 7159](https://tools.ietf.org/html/rfc7159).
### Where to go next
- [Back up DTR](back-up-dtr)

37
ee/admin/backup/index.md Normal file
View File

@ -0,0 +1,37 @@
---
title: Back up Docker Enterprise
description: Learn how to create a backup of your Docker Enterprise.
keywords: enterprise, backup
redirect_from:
- /enterprise/backup/
- /ee/backup/
---
## Introduction
This document provides instructions and best practices for Docker Enterprise backup procedures for all components of the platform.
> **Important**: Make sure you perform regular backups for Docker Enterprise, including prior to an upgrade or uninstallation.
## Prerequisites
- Before performing a backup or restore operation for any component of Docker Enterprise, you must have healthy managers. Otherwise, disaster recovery procedures are involved.
- Have adequate space available for backup contents.
## Procedure
To back up Docker Enterprise, you must create individual backups
for each of the following components:
1. [Back up Docker Swarm](back-up-swarm). Back up Swarm resources like service and network definitions.
2. [Back up Universal Control Plane (UCP)](back-up-ucp). Back up UCP configurations.
3. [Back up Docker Trusted Registry (DTR)](back-up-dtr). Back up DTR configurations, images, and metadata.
If you do not create backups for all components, you cannot restore your deployment to its previous state.
Test each backup you create. One way to test your backups is to do
a fresh installation on a separate infrastructure with the backup. Refer to [Restore Docker Enterprise](/ee/admin/restore/) for additional information.
**Note**: Application data backup is **not** included in this information. Persistent storage data backup is the responsibility of the storage provider for the storage plugin or driver.
### Where to go next
- [Back up Docker Swarm](back-up-swarm)

View File

@ -0,0 +1,384 @@
---
title: Disaster recovery
description: Learn disaster recovery procedures for Docker Enterprise
keywords: enterprise, recovery, disaster recovery, dtr, ucp, swarm
redirect_from:
- /ee/dtr/admin/disaster-recovery/
- /ee/dtr/admin/disaster-recovery/repair-a-single-replica/
- /ee/dtr/admin/disaster-recovery/repair-a-cluster/
---
Disaster recovery procedures should be performed in the following order:
1. [Docker Swarm](#swarm-disaster-recovery).
2. [Universal Control Plane (UCP)](#ucp-disaster-recovery).
3. [Docker Trusted Registry (DTR)](#dtr-disaster-recovery).
## Swarm disaster recovery
### Recover from losing the quorum
Swarm is resilient to failures and the swarm can recover from any number
of temporary node failures (machine reboots or crash with restart) or other
transient errors. However, a swarm cannot automatically recover if it loses a
quorum. Tasks on existing worker nodes continue to run, but administrative
tasks are not possible, including scaling or updating services and joining or
removing nodes from the swarm. The best way to recover is to bring the missing
manager nodes back online. If that is not possible, continue reading for some
options for recovering your swarm.
In a swarm of `N` managers, a quorum (a majority) of manager nodes must always
be available. For example, in a swarm with 5 managers, a minimum of 3 must be
operational and in communication with each other. In other words, the swarm can
tolerate up to `(N-1)/2` permanent failures beyond which requests involving
swarm management cannot be processed. These types of failures include data
corruption or hardware failures.
If you lose the quorum of managers, you cannot administer the swarm. If you have
lost the quorum and you attempt to perform any management operation on the swarm,
an error occurs:
```none
Error response from daemon: rpc error: code = 4 desc = context deadline exceeded
```
The best way to recover from losing the quorum is to bring the failed nodes back
online. If you can't do that, the only way to recover from this state is to use
the `--force-new-cluster` action from a manager node. This removes all managers
except the manager the command was run from. The quorum is achieved because
there is now only one manager. Promote nodes to be managers until you have the
desired number of managers.
```bash
# From the node to recover
docker swarm init --force-new-cluster --advertise-addr node01:2377
```
When you run the `docker swarm init` command with the `--force-new-cluster`
flag, the Docker Engine where you run the command becomes the manager node of a
single-node swarm which is capable of managing and running services. The manager
has all the previous information about services and tasks, worker nodes are
still part of the swarm, and services are still running. You need to add or
re-add manager nodes to achieve your previous task distribution and ensure that
you have enough managers to maintain high availability and prevent losing the
quorum.
### Force the swarm to rebalance
Generally, you do not need to force the swarm to rebalance its tasks. When you
add a new node to a swarm, or a node reconnects to the swarm after a
period of unavailability, the swarm does not automatically give a workload to
the idle node. This is a design decision. If the swarm periodically shifted tasks
to different nodes for the sake of balance, the clients using those tasks would
be disrupted. The goal is to avoid disrupting running services for the sake of
balance across the swarm. When new tasks start, or when a node with running
tasks becomes unavailable, those tasks are given to less busy nodes. The goal
is eventual balance, with minimal disruption to the end user.
In Docker 1.13 and higher, you can use the `--force` or `-f` flag with the
`docker service update` command to force the service to redistribute its tasks
across the available worker nodes. This causes the service tasks to restart.
Client applications may be disrupted. If you have configured it, your service
uses a [rolling update](/engine/swarm/swarm-tutorial/rolling-update/).
If you use an earlier version and you want to achieve an even balance of load
across workers and don't mind disrupting running tasks, you can force your swarm
to re-balance by temporarily scaling the service upward. Use
`docker service inspect --pretty <servicename>` to see the configured scale
of a service. When you use `docker service scale`, the nodes with the lowest
number of tasks are targeted to receive the new workloads. There may be multiple
under-loaded nodes in your swarm. You may need to scale the service up by modest
increments a few times to achieve the balance you want across all the nodes.
When the load is balanced to your satisfaction, you can scale the service back
down to the original scale. You can use `docker service ps` to assess the current
balance of your service across nodes.
See also
[`docker service scale`](/engine/reference/commandline/service_scale/) and
[`docker service ps`](/engine/reference/commandline/service_ps/).
## UCP disaster recovery
In the event half or more manager nodes are lost and cannot be recovered
to a healthy state, the system is considered to have lost quorum and can only be
restored through the following disaster recovery procedure.
### Recover a UCP cluster from an existing backup
1. If UCP is still installed on the swarm, uninstall UCP using the `uninstall-ucp` command.
> **Note**: If the restore is happening on new machines, skip this step.
2. Perform a [restore from an existing backup](/ee/admin/restore/) on any node. If there is an
existing swarm, the restore operation must be performed on a manager node. If no swarm exists,
the restore operation will create one.
### Recover a UCP cluster without an existing backup (not recommended)
If your cluster has lost quorum, you can still perform a backup of one of the remaining nodes.
> **Important**: Performing a backup after losing quorum is not guaranteed to succeed with
no loss of running services or configuration data. To properly protect against
manager failures, the system should be configured for
[high availability](/ee/ucp/admin/configure/join-nodes/), and backups should be performed regularly
in order to have complete backup data.
1. On one of the remaining manager nodes, perform `docker swarm init --force-new-cluster`. You might also need to specify an
`--advertise-addr` parameter, which is equivalent to the `--host-address`
parameter of the `docker/ucp install` operation. This instantiates a new
single-manager swarm by recovering as much state as possible from the
existing manager. This is a disruptive operation and existing tasks might be
either terminated or suspended.
2. [Create a backup](/ee/admin/backup/) of the remaining manager node.
3. If UCP is still installed on the swarm, uninstall UCP using the
`uninstall-ucp` command.
4. Perform a [restore](/ee/admin/restore/) on the recovered swarm manager node.
5. Log in to UCP and browse to the nodes page, or use the CLI `docker node ls`
command.
6. If any nodes are listed as `down`, you'll have to manually [remove these
nodes](/ee/ucp/admin/configure/scale-your-cluster/) from the swarm and then re-join
them using a `docker swarm join` operation with the swarm's new join-token.
7. [Create a backup](/ee/admin/backup/) of the restored cluster.
### Recreate objects within Orchestrators that Docker Enterprise supports
Kubernetes currently backs up the declarative state of Kube objects in etcd. However, for Swarm, there is no way to take the state and export it to a declarative format, since the objects that are embedded within the Swarm raft logs are not easily transferable to other nodes or clusters.
For disaster recovery, to recreate swarm related workloads requires having the original scripts used for deployment. Alternatively, you can recreate workloads by manually recreating output from `docker inspect` commands.
## DTR disaster recovery
Docker Trusted Registry is a clustered application. You can join multiple
replicas for high availability. For a DTR cluster to be healthy, a majority of its replicas (n/2 + 1) need to
be healthy and be able to communicate with the other replicas. This is also
known as maintaining quorum.
This means that there are three failure scenarios possible.
### Replica is unhealthy but cluster maintains quorum
One or more replicas are unhealthy, but the overall majority (n/2 + 1) is still
healthy and able to communicate with one another.
![Failure scenario 1](/ee/dtr/images/dr-overview-1.svg)
In this example the DTR cluster has five replicas but one of the nodes stopped
working, and the other has problems with the DTR overlay network.
Even though these two replicas are unhealthy the DTR cluster has a majority
of replicas still working, which means that the cluster is healthy.
In this case you should repair the unhealthy replicas, or remove them from
the cluster and join new ones.
#### Repair a single replica
When one or more DTR replicas are unhealthy but the overall majority
(n/2 + 1) is healthy and able to communicate with one another, your DTR
cluster is still functional and healthy.
![Cluster with two nodes unhealthy](/ee/dtr/images/repair-replica-1.svg)
Given that the DTR cluster is healthy, there's no need to execute any disaster
recovery procedures like restoring from a backup.
Instead, you should:
1. Remove the unhealthy replicas from the DTR cluster.
2. Join new replicas to make DTR highly available.
Since a DTR cluster requires a majority of replicas to be healthy at all times,
the order of these operations is important. If you join more replicas before
removing the ones that are unhealthy, your DTR cluster might become unhealthy.
##### Split-brain scenario
To understand why you should remove unhealthy replicas before joining new ones,
imagine you have a five-replica DTR deployment, and something goes wrong with
the overlay network connection the replicas, causing them to be separated in
two groups.
![Cluster with network problem](/ee/dtr/images/repair-replica-2.svg)
Because the cluster originally had five replicas, it can work as long as
three replicas are still healthy and able to communicate (5 / 2 + 1 = 3).
Even though the network separated the replicas in two groups, DTR is still
healthy.
If at this point you join a new replica instead of fixing the network problem
or removing the two replicas that got isolated from the rest, it's possible
that the new replica ends up in the side of the network partition that has
less replicas.
![cluster with split brain](/ee/dtr/images/repair-replica-3.svg)
When this happens, both groups now have the minimum amount of replicas needed
to establish a cluster. This is also known as a split-brain scenario, because
both groups can now accept writes and their histories start diverging, making
the two groups effectively two different clusters.
##### Remove replicas
To remove unhealthy replicas, you'll first have to find the replica ID
of one of the replicas you want to keep, and the replica IDs of the unhealthy
replicas you want to remove.
You can find the list of replicas by navigating to **Shared Resources > Stacks** or **Swarm > Volumes** (when using [swarm mode](/engine/swarm/)) on the UCP web interface, or by using the UCP
client bundle to run:
{% raw %}
```bash
docker ps --format "{{.Names}}" | grep dtr
# The list of DTR containers with <node>/<component>-<replicaID>, e.g.
# node-1/dtr-api-a1640e1c15b6
```
{% endraw %}
Another way to determine the replica ID is to SSH into a DTR node and run the following:
{% raw %}
```bash
REPLICA_ID=$(docker inspect -f '{{.Name}}' $(docker ps -q -f name=dtr-rethink) | cut -f 3 -d '-')
&& echo $REPLICA_ID
```
{% endraw %}
Then use the UCP client bundle to remove the unhealthy replicas:
```bash
docker run -it --rm {{ page.dtr_org }}/{{ page.dtr_repo }}:{{ page.dtr_version }} remove \
--existing-replica-id <healthy-replica-id> \
--replica-ids <unhealthy-replica-id> \
--ucp-insecure-tls \
--ucp-url <ucp-url> \
--ucp-username <user> \
--ucp-password <password>
```
You can remove more than one replica at the same time, by specifying multiple
IDs with a comma.
![Healthy cluster](/ee/dtr/images/repair-replica-4.svg)
##### Join replicas
Once you've removed the unhealthy nodes from the cluster, you should join new
ones to make sure your cluster is highly available.
Use your UCP client bundle to run the following command which prompts you for
the necessary parameters:
```bash
docker run -it --rm \
{{ page.dtr_org }}/{{ page.dtr_repo }}:{{ page.dtr_version }} join \
--ucp-node <ucp-node-name> \
--ucp-insecure-tls
```
[Learn more about high availability](/ee/dtr/admin/configure/set-up-high-availability/).
### The majority of replicas are unhealthy
If a majority of replicas are unhealthy, making the cluster lose quorum, but at
least one replica is still healthy, or at least the data volumes for DTR are
accessible from that replica, you can repair the cluster without having to restore from
a backup. This minimizes the amount of data loss. The following image provides an example of this scenario.
![Failure scenario 2](/ee/dtr/images/dr-overview-2.svg)
#### Repair a cluster
For a DTR cluster to be healthy, a majority of its replicas (n/2 + 1) need to
be healthy and be able to communicate with the other replicas. This is known
as maintaining quorum.
In a scenario where quorum is lost, but at least one replica is still
accessible, you can use that replica to repair the cluster. That replica doesn't
need to be completely healthy. The cluster can still be repaired as the DTR
data volumes are persisted and accessible.
![Unhealthy cluster](/ee/dtr/images/repair-cluster-1.svg)
Repairing the cluster from an existing replica minimizes the amount of data lost.
If this procedure doesn't work, you'll have to
[restore from an existing backup](/ee/admin/restore/).
##### Diagnose an unhealthy cluster
When a majority of replicas are unhealthy, causing the overall DTR cluster to
become unhealthy, operations like `docker login`, `docker pull`, and `docker push`
present `internal server error`.
Accessing the `/_ping` endpoint of any replica also returns the same error.
It's also possible that the DTR web UI is partially or fully unresponsive.
##### Perform an emergency repair
Use the `docker/dtr emergency-repair` command to try to repair an unhealthy
DTR cluster, from an existing replica.
This command checks the data volumes for the DTR
This command checks the data volumes for the DTR replica are uncorrupted,
redeploys all internal DTR components and reconfigured them to use the existing
volumes.
It also reconfigures DTR removing all other nodes from the cluster, leaving DTR
as a single-replica cluster with the replica you chose.
Start by finding the ID of the DTR replica that you want to repair from.
You can find the list of replicas by navigating to **Shared Resources > Stacks** or **Swarm > Volumes** (when using [swarm mode](/engine/swarm/)) on the UCP web interface, or by using
a UCP client bundle to run:
{% raw %}
```bash
docker ps --format "{{.Names}}" | grep dtr
# The list of DTR containers with <node>/<component>-<replicaID>, e.g.
# node-1/dtr-api-a1640e1c15b6
```
{% endraw %}
Another way to determine the replica ID is to SSH into a DTR node and run the following:
{% raw %}
```bash
REPLICA_ID=$(docker inspect -f '{{.Name}}' $(docker ps -q -f name=dtr-rethink) | cut -f 3 -d '-')
&& echo $REPLICA_ID
```
{% endraw %}
Then, use your UCP client bundle to run the emergency repair command:
```bash
docker run -it --rm {{ page.dtr_org }}/{{ page.dtr_repo }}:{{ page.dtr_version }} emergency-repair \
--ucp-insecure-tls \
--existing-replica-id <replica-id>
```
If the emergency repair procedure is successful, your DTR cluster now has a
single replica. You should now
[join more replicas for high availability](/ee/dtr/admin/configure/set-up-high-availability/).
![Healthy cluster](/ee/dtr/images/repair-cluster-2.svg)
If the emergency repair command fails, try running it again using a different
replica ID. As a last resort, you can restore your cluster from an existing
backup.
### All replicas are unhealthy
This is a total disaster scenario where all DTR replicas are lost, causing
the data volumes for all DTR replicas to get corrupted or be lost.
![Failure scenario 3](/ee/dtr/images/dr-overview-3.svg)
In a disaster scenario like this, you'll have to restore DTR from an existing
backup. Restoring from a backup should be only used as a last resort, since
doing an emergency repair might prevent some data loss.
[Create a backup](/ee/admin/backup/).
## Where to go next
- [Create a backup](/ee/admin/backup/)
- [Set up high availability](/ee/ucp/admin/configure/join-nodes/)

19
ee/admin/restore/index.md Normal file
View File

@ -0,0 +1,19 @@
---
title: Restore Docker Enterprise
description: Learn how to restore Docker Enterprise platform from a backup.
keywords: enterprise, restore, recovery
---
You should only restore Docker Enterprise Edition from a backup as a last resort. If you're running Docker
Enterprise Edition in high-availability mode, you can remove unhealthy nodes from the
swarm and join new ones to bring the swarm to an healthy state.
Restore components individually and in the following order:
1. [Restore Docker Swarm](restore-swarm).
2. [Restore Universal Control Plane (UCP)](restore-ucp).
3. [Restore Docker Trusted Registry (DTR)](restore-dtr).
## Where to go next
- [Restore Docker Swarm](restore-swarm)

View File

@ -0,0 +1,118 @@
---
title: Restore from a backup
description: Learn how to restore a DTR cluster from an existing backup
keywords: dtr, disaster recovery
redirect_from:
- /ee/dtr/admin/disaster-recovery/restore-from-backup/
---
{% assign metadata_backup_file = "dtr-metadata-backup.tar" %}
{% assign image_backup_file = "dtr-image-backup.tar" %}
## Restore DTR data
If your DTR has a majority of unhealthy replicas, the one way to restore it to
a working state is by restoring from an existing backup.
To restore DTR, you need to:
1. Stop any DTR containers that might be running
2. Restore the images from a backup
3. Restore DTR metadata from a backup
4. Re-fetch the vulnerability database
You need to restore DTR on the same UCP cluster where you've created the
backup. If you restore on a different UCP cluster, all DTR resources will be
owned by users that don't exist, so you'll not be able to manage the resources,
even though they're stored in the DTR data store.
When restoring, you need to use the same version of the `docker/dtr` image
that you've used when creating the update. Other versions are not guaranteed
to work.
### Remove DTR containers
Start by removing any DTR container that is still running:
```none
docker run -it --rm \
{{ page.dtr_org }}/{{ page.dtr_repo }}:{{ page.dtr_version }} destroy \
--ucp-insecure-tls
```
### Restore images
If you had DTR configured to store images on the local filesystem, you can
extract your backup:
```none
sudo tar -xf {{ image_backup_file }} -C /var/lib/docker/volumes
```
If you're using a different storage backend, follow the best practices
recommended for that system.
### Restore DTR metadata
You can restore the DTR metadata with the `docker/dtr restore` command. This
performs a fresh installation of DTR, and reconfigures it with
the configuration created during a backup.
Load your UCP client bundle, and run the following command, replacing the
placeholders for the real values:
```bash
read -sp 'ucp password: ' UCP_PASSWORD;
```
This prompts you for the UCP password. Next, run the following to restore DTR from your backup. You can learn more about the supported flags in [docker/dtr restore](/reference/dtr/2.6/cli/restore).
```bash
docker run -i --rm \
--env UCP_PASSWORD=$UCP_PASSWORD \
{{ page.dtr_org }}/{{ page.dtr_repo }}:{{ page.dtr_version }} restore \
--ucp-url <ucp-url> \
--ucp-insecure-tls \
--ucp-username <ucp-username> \
--ucp-node <hostname> \
--replica-id <replica-id> \
--dtr-external-url <dtr-external-url> < {{ metadata_backup_file }}
```
Where:
* `<ucp-url>` is the url you use to access UCP
* `<ucp-username>` is the username of a UCP administrator
* `<hostname>` is the hostname of the node where you've restored the images
* `<replica-id>` the id of the replica you backed up
* `<dtr-external-url>`the url that clients use to access DTR
#### DTR 2.5 and below
If you're using NFS as a storage backend, also include `--nfs-storage-url` as
part of your restore command, otherwise DTR is restored but starts using a
local volume to persist your Docker images.
#### DTR 2.5 (with experimental online garbage collection) and DTR 2.6.0-2.6.3
> When running DTR 2.5 (with experimental online garbage collection) and 2.6.0 to 2.6.3, there is an issue with
> [reconfiguring and restoring DTR with `--nfs-storage-url`](/ee/dtr/release-notes#version-26) which leads to
> erased tags. Make sure to [back up your DTR metadata](/ee/dtr/admin/disaster-recovery/create-a-backup/#back-up-dtr-metadata)
> before you proceed. To work around the `--nfs-storage-url`flag issue, manually create a storage volume on each DTR node.
> To [restore DTR](/reference/dtr/2.6/cli/restore/) from an existing backup, use `docker/dtr restore`
> with `--dtr-storage-volume` and the new volume.
> See [Restore to a Local NFS Volume]( https://success.docker.com/article/dtr-26-lost-tags-after-reconfiguring-storage#restoretoalocalnfsvolume)
> for Docker's recommended recovery strategy.
{: .info}
### Re-fetch the vulnerability database
If you're scanning images, you now need to download the vulnerability database.
After you successfully restore DTR, you can join new replicas the same way you
would after a fresh installation. [Learn more](/ee/dtr/admin/configure/set-up-vulnerability-scans.md).
## Where to go next
- [docker/dtr restore](/reference/dtr/2.6/cli/restore/)

View File

@ -0,0 +1,53 @@
---
title: Restore Docker Swarm
description: Learn how to restore Docker Swarm from an existing backup
keywords: enterprise, restore, swarm
---
## Prerequisites
- You must use the same IP as the node from which you made the backup. The command to force the new cluster does not reset the IP in the Swarm data.
- You must restore the backup on the same Docker Engine version.
- You can find the list of manager IP addresses in `state.json` in the zip file.
- If `auto-lock` was enabled on the old Swarm, the unlock key is required to perform the restore.
## Perform Swarm restore
Use the following procedure on each manager node to restore data to a new swarm.
1. Shut down the Docker Engine on the node you select for the restore:
```
systemctl stop docker
```
2. Remove the contents of the `/var/lib/docker/swarm` directory on the new Swarm if it exists.
3. Restore the `/var/lib/docker/swarm` directory with the contents of the backup.
> **Note**: The new node uses the same encryption key for on-disk
> storage as the old one. It is not possible to change the on-disk storage
> encryption keys at this time. In the case of a swarm with auto-lock enabled,
> the unlock key is also the same as on the old swarm, and the unlock key is
> needed to restore the swarm.
4. Start Docker on the new node. Unlock the swarm if necessary.
```
systemctl start docker
```
5. Re-initialize the swarm so that the node does not attempt to connect to nodes that were part of the old swarm, and presumably no longer exist:
```
$ docker swarm init --force-new-cluster
```
6. Verify that the state of the swarm is as expected. This may include
application-specific tests or simply checking the output of
`docker service ls` to be sure that all expected services are present.
7. If you use auto-lock,
[rotate the unlock key](/engine/swarm/swarm_manager_locking.md#rotate-the-unlock-key).
8. Add the manager and worker nodes to the new swarm.
9. Reinstate your previous backup regimen on the new swarm.
### Where to go next
- [Restore UCP](restore-ucp)

View File

@ -0,0 +1,98 @@
---
title: Restore UCP
description: Learn how to restore UCP from a backup
keywords: enterprise, restore, swarm
---
To restore UCP, select one of the following options:
* Run the restore on the machines from which the backup originated or on new machines. You can use the same swarm from which the backup originated or a new swarm.
* On a manager node of an existing swarm that does not have UCP installed.
In this case, UCP restore will use the existing swarm. and runs instead of any install.
* Run the restore on a docker engine that isn't participating in a swarm, in which case it performs `docker swarm init` in the same way as the install operation would. A new swarm is created and UCP is restored on top.
## Limitations
- To restore an existing UCP installation from a backup, you need to
uninstall UCP from the swarm by using the `uninstall-ucp` command.
[Learn to uninstall UCP](/ee/ucp/admin/install/uninstall/).
- Restore operations must run using the same major/minor UCP version (and `docker/ucp` image version) as the backed up cluster. Restoring to a later patch release version is allowed.
- If you restore UCP using a different Docker swarm than the one where UCP was
previously deployed on, UCP will start using new TLS certificates. Existing
client bundles won't work anymore, so you must download new ones.
## Kubernetes settings, data, and state
During the UCP restore, Kubernetes declarative objects are re-created, containers are re-created, and IPs are resolved.
For more information, see [Restoring an etcd cluster](https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#restoring-an-etcd-cluster).
## Perform UCP restore
When the restore operations starts, it looks for the UCP version used in the backup and performs one of the following actions:
- Fails if the restore operation is running using an image that does not match the UCP version from the backup (a `--force` flag is available to override this if necessary)
- Provides instructions how to run the restore process using the matching UCP version from the backup
Volumes are placed onto the host on which the UCP restore command occurs.
The following example shows how to restore UCP from an existing backup file, presumed to be located at `/tmp/backup.tar` (replace `<UCP_VERSION>` with the version of your backup):
```
$ docker container run --rm -i --name ucp \
-v /var/run/docker.sock:/var/run/docker.sock \
docker/ucp:<UCP_VERSION> restore < /tmp/backup.tar
```
If the backup file is encrypted with a passphrase, provide the passphrase to the restore operation(replace `<UCP_VERSION>` with the version of your backup):
```
$ docker container run --rm -i --name ucp \
-v /var/run/docker.sock:/var/run/docker.sock \
docker/ucp:<UCP_VERSION> restore --passphrase "secret" < /tmp/backup.tar
```
The restore command may also be invoked in interactive mode, in which case the backup file should be mounted to the container rather than streamed through stdin (replace `<UCP_VERSION>` with the version of your backup):
```
$ docker container run --rm -i --name ucp \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /tmp/backup.tar:/config/backup.tar \
docker/ucp:<UCP_VERSION> restore -i
```
The restore command can also be invoked in interactive mode, in which case the
backup file should be mounted to the container rather than streamed through
`stdin`:
```none
docker container run --rm -i --name ucp \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /tmp/backup.tar:/config/backup.tar \
{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} restore -i
```
## Regenerate Certs
The current certs volume contain cluster specific information (such as SANs) is invalid on new clusters with different IPs. For volumes that are not backed up (`ucp-node-certs`, for example), the restore regenerates certs. For certs that are backed up, (ucp-controller-server-certs), the restore does not perform a regeneration and you m ust correct those certs when the restore completes.
After you successfully restore UCP, you can add new managers and workers the same way you would after a fresh installation.
## Restore operation status
For restore operations, view the standard streams of the UCP bootstrap container.
## Verify the UCP restore
A successful UCP restore involves verifying the following items:
- All swarm managers are healthy after running the following command:
```
"curl -s -k https://localhost/_ping".
```
**Note**: Monitor all swarm managers for at least 15 minutes to ensure no degradation.
- No containers on swarm managers are marked as "unhealthy".
- All swarm managers and nodes are running containers with the new version.
- No swarm managers or nodes are running containers with the old version, except for Kubernetes Pods that use the "ucp-pause" image.
### Where to go next
- [Restore DTR](restore-dtr)

View File

@ -1,39 +0,0 @@
---
title: Backup Docker EE
description: Learn how to create a backup of your Docker Enterprise Edition, and how to restore from a backup.
keywords: enterprise, backup, restore
redirect_from:
- /enterprise/backup/
---
To backup Docker Enterprise Edition you need to create individual backups
for each of the following components:
1. Docker Swarm. [Backup Swarm resources like service and network definitions](/engine/swarm/admin_guide.md#back-up-the-swarm).
2. Universal Control Plane (UCP). [Backup UCP configurations](/ee/ucp/admin/backups-and-disaster-recovery.md).
3. Docker Trusted Registry (DTR). [Backup DTR configurations and images](/ee/dtr/admin/disaster-recovery/create-a-backup.md).
Before proceeding to backup the next component, you should test the backup you've
created to make sure it's not corrupt. One way to test your backups is to do
a fresh installation in a separate infrastructure and restore the new installation
using the backup you've created.
If you create backups for a single component, you can't restore your
deployment to its previous state.
## Restore Docker Enterprise Edition
You should only restore from a backup as a last resort. If you're running Docker
Enterprise Edition in high-availability you can remove unhealthy nodes from the
swarm and join new ones to bring the swarm to an healthy state.
To restore Docker Enterprise Edition, you need to restore the individual
components one by one:
1. Docker Swarm. [Learn more](/engine/swarm/admin_guide.md#recover-from-disaster).
2. Universal Control Plane (UCP). [Learn more](/ee/ucp/admin/backups-and-disaster-recovery.md#restore-your-swarm).
3. Docker Trusted Registry (DTR). [Learn more](/ee/dtr/admin/disaster-recovery/restore-from-backup.md).
## Where to go next
- [Upgrade Docker EE](upgrade.md)