mirror of https://github.com/docker/docs.git
Backup, restore, and disaster recovery refactor (#1094)
* Consolidation/cleanup * Minor wording update * Clean up files and fix broken links
This commit is contained in:
parent
1591d9615f
commit
740512042a
|
@ -1202,20 +1202,42 @@ manuals:
|
|||
title: About Docker EE
|
||||
- title: Try Docker EE on the cloud
|
||||
path: https://trial.docker.com
|
||||
- path: /ee/docker-ee-architecture/
|
||||
title: Docker EE Architecture
|
||||
- path: /ee/supported-platforms/
|
||||
title: Supported platforms
|
||||
nosync: true
|
||||
- path: /ee/end-to-end-install/
|
||||
title: Deploy Docker EE standard
|
||||
- path: /ee/backup/
|
||||
title: Backup Docker EE
|
||||
title: Deploy Docker EE standard
|
||||
- sectiontitle: Back up Docker Enterprise
|
||||
section:
|
||||
- path: /ee/admin/backup/
|
||||
title: Overview
|
||||
- path: /ee/admin/backup/back-up-swarm/
|
||||
title: Back up Docker Swarm
|
||||
- path: /ee/admin/backup/back-up-ucp/
|
||||
title: Back up UCP
|
||||
- path: /ee/admin/backup/back-up-dtr/
|
||||
title: Back up DTR
|
||||
- sectiontitle: Restore Docker Enterprise
|
||||
section:
|
||||
- path: /ee/admin/restore/
|
||||
title: Overview
|
||||
- path: /ee/admin/restore/restore-swarm/
|
||||
title: Restore Docker Swarm
|
||||
- path: /ee/admin/restore/restore-ucp/
|
||||
title: Restore UCP
|
||||
- path: /ee/admin/restore/restore-dtr/
|
||||
title: Restore DTR
|
||||
- sectiontitle: Disaster Recovery
|
||||
section:
|
||||
- path: /ee/admin/disaster-recovery/
|
||||
title: Overview
|
||||
- path: /ee/upgrade/
|
||||
title: Upgrade Docker EE
|
||||
- path: /ee/docker-ee-architecture/
|
||||
title: Docker EE Architecture
|
||||
title: Upgrade Docker Enterprise
|
||||
- path: /ee/telemetry/
|
||||
title: Manage usage data collection
|
||||
- sectiontitle: Docker EE Engine
|
||||
- sectiontitle: Engine
|
||||
section:
|
||||
- path: /ee/supported-platforms/
|
||||
title: Install Docker EE Engine
|
||||
|
@ -1348,8 +1370,6 @@ manuals:
|
|||
title: Troubleshoot with logs
|
||||
- path: /ee/ucp/admin/monitor-and-troubleshoot/troubleshoot-configurations/
|
||||
title: Troubleshoot configurations
|
||||
- path: /ee/ucp/admin/backups-and-disaster-recovery/
|
||||
title: Backups and disaster recovery
|
||||
- title: CLI reference
|
||||
path: /reference/ucp/3.1/cli/
|
||||
nosync: true
|
||||
|
|
|
@ -0,0 +1,190 @@
|
|||
---
|
||||
title: Back up DTR
|
||||
description: Learn how to create a DTR backup
|
||||
keywords: enterprise, backup, dtr, disaster recovery
|
||||
redirect_from:
|
||||
- /ee/dtr/admin/disaster-recovery/create-a-backup/
|
||||
---
|
||||
|
||||
Backups do not cause downtime for DTR.
|
||||
|
||||
## DTR backup contents
|
||||
|
||||
All metadata and authZ information for a given DTR cluster is backed up.
|
||||
|
||||
| Data | Backed up | Description |
|
||||
|:-----------------------------------|:----------|:---------------------------------------------------------------|
|
||||
| Configurations | yes | DTR settings and cluster configurations |
|
||||
| Repository metadata | yes | Metadata such as image architecture, repositories, images deployed, and size |
|
||||
| Access control to repos and images | yes | Data about who has access to which images and repositories |
|
||||
| Notary data | yes | Signatures and digests for images that are signed |
|
||||
| Scan results | yes | Information about vulnerabilities in your images |
|
||||
| Certificates and keys | yes | Certificates, public keys, and private keys that are used for mutual TLS communication |
|
||||
| Image content | no | The images you push to DTR. This can be stored on the file system of the node running DTR, or other storage system, depending on the configuration. Needs to be backed up separately, depends on DTR configuration |
|
||||
| Users, orgs, teams | no | Create a UCP backup to back up this data |
|
||||
| Vulnerability database | no | Can be redownloaded after a restore |
|
||||
|
||||
This data is persisted on the host running DTR, using named volumes.
|
||||
[Learn more about DTR named volumes](/ee/dtr/architecture/).
|
||||
|
||||
## Perform DTR backup
|
||||
|
||||
You should always create backups from the same DTR replica, to ensure a smoother
|
||||
restore. If you have not previously performed a backup, the web interface displays a warning:
|
||||
|
||||

|
||||
|
||||
To create a DTR backup, perform the following steps:
|
||||
|
||||
1. Run [docker/dtr backup](/reference/dtr/{{site.dtr_version}}/cli/backup/)
|
||||
2. [Back up DTR image content]((#back-up-image-content)
|
||||
3. [Back up DTR metadata](#back-up-dtr-metadata)
|
||||
4. [Verify your backup](#verify-your-backup)
|
||||
|
||||
|
||||
### Run the DTR backup command (CLI)
|
||||
|
||||
#### Find your replica ID
|
||||
|
||||
Since you need your DTR replica ID during a backup, the following covers a few ways for you to determine your replica ID:
|
||||
|
||||
##### UCP web interface
|
||||
|
||||
You can find the list of replicas by navigating to **Shared Resources > Stacks** or **Swarm > Volumes** (when using [swarm mode](/engine/swarm/)) on the UCP web interface.
|
||||
|
||||
##### UCP client bundle
|
||||
|
||||
From a terminal [using a UCP client bundle]((/ee/ucp/user-access/cli/), run:
|
||||
|
||||
{% raw %}
|
||||
```bash
|
||||
docker ps --format "{{.Names}}" | grep dtr
|
||||
|
||||
# The list of DTR containers with <node>/<component>-<replicaID>, e.g.
|
||||
# node-1/dtr-api-a1640e1c15b6
|
||||
```
|
||||
{% endraw %}
|
||||
|
||||
|
||||
##### SSH access
|
||||
|
||||
Another way to determine the replica ID is to SSH into a DTR node and run the following:
|
||||
|
||||
{% raw %}
|
||||
```bash
|
||||
REPLICA_ID=$(docker inspect -f '{{.Name}}' $(docker ps -q -f name=dtr-rethink) | cut -f 3 -d '-')
|
||||
&& echo $REPLICA_ID
|
||||
```
|
||||
{% endraw %}
|
||||
|
||||
#### Back up image content
|
||||
|
||||
Since you can configure the storage backend that DTR uses to store images,
|
||||
the way you back up images depends on the storage backend you're using.
|
||||
|
||||
If you've configured DTR to store images on the local file system or NFS mount,
|
||||
you can backup the images by using SSH to log in to a DTR node,
|
||||
and creating a tar archive of the [dtr-registry volume](/ee/dtr/architecture/):
|
||||
|
||||
{% raw %}
|
||||
```none
|
||||
sudo tar -cf {{ image_backup_file }} \
|
||||
-C /var/lib/docker/volumes/ dtr-registry-<replica-id>
|
||||
```
|
||||
{% endraw %}
|
||||
|
||||
If you're using a different storage backend, follow the best practices
|
||||
recommended for that system.
|
||||
|
||||
|
||||
#### Back up DTR metadata
|
||||
|
||||
To create a DTR backup, load your UCP client bundle, and run the following
|
||||
command, replacing the placeholders with real values:
|
||||
|
||||
```bash
|
||||
read -sp 'ucp password: ' UCP_PASSWORD;
|
||||
```
|
||||
|
||||
This prompts you for the UCP password. Next, run the following to back up your DTR metadata and save the result into a tar archive. You can learn more about the supported flags in
|
||||
the [reference documentation](/reference/dtr/2.6/cli/backup/).
|
||||
|
||||
```bash
|
||||
docker run --log-driver none -i --rm \
|
||||
--env UCP_PASSWORD=$UCP_PASSWORD \
|
||||
{{ page.dtr_org }}/{{ page.dtr_repo }}:{{ page.dtr_version }} backup \
|
||||
--ucp-url <ucp-url> \
|
||||
--ucp-insecure-tls \
|
||||
--ucp-username <ucp-username> \
|
||||
--existing-replica-id <replica-id> > {{ metadata_backup_file }}
|
||||
```
|
||||
|
||||
Where:
|
||||
|
||||
* `<ucp-url>` is the url you use to access UCP.
|
||||
* `<ucp-username>` is the username of a UCP administrator.
|
||||
* `<replica-id>` is the id of the DTR replica to backup.
|
||||
|
||||
|
||||
By default the backup command doesn't stop the DTR replica being backed up.
|
||||
This means you can take frequent backups without affecting your users.
|
||||
|
||||
You can use the `--offline-backup` option to stop the DTR replica while taking
|
||||
the backup. If you do this, remove the replica from the load balancing pool.
|
||||
|
||||
Also, the backup contains sensitive information
|
||||
like private keys, so you can encrypt the backup by running:
|
||||
|
||||
```none
|
||||
gpg --symmetric {{ metadata_backup_file }}
|
||||
```
|
||||
|
||||
This prompts you for a password to encrypt the backup, copies the backup file
|
||||
and encrypts it.
|
||||
|
||||
|
||||
## Verify your backup
|
||||
|
||||
To validate that the backup was correctly performed, you can print the contents
|
||||
of the tar file created. The backup of the images should look like:
|
||||
|
||||
```none
|
||||
tar -tf {{ metadata_backup_file }}
|
||||
|
||||
dtr-backup-v{{ page.dtr_version }}/
|
||||
dtr-backup-v{{ page.dtr_version }}/rethink/
|
||||
dtr-backup-v{{ page.dtr_version }}/rethink/layers/
|
||||
```
|
||||
|
||||
And the backup of the DTR metadata should look like:
|
||||
|
||||
```none
|
||||
tar -tf {{ metadata_backup_file }}
|
||||
|
||||
# The archive should look like this
|
||||
dtr-backup-v{{ page.dtr_version }}/
|
||||
dtr-backup-v{{ page.dtr_version }}/rethink/
|
||||
dtr-backup-v{{ page.dtr_version }}/rethink/properties/
|
||||
dtr-backup-v{{ page.dtr_version }}/rethink/properties/0
|
||||
```
|
||||
|
||||
If you've encrypted the metadata backup, you can use:
|
||||
|
||||
```none
|
||||
gpg -d {{ metadata_backup_file }} | tar -t
|
||||
```
|
||||
|
||||
You can also create a backup of a UCP cluster and restore it into a new
|
||||
cluster. Then restore DTR on that new cluster to confirm that everything is
|
||||
working as expected.
|
||||
|
||||
### Where to go next
|
||||
- [Configure your storage backend](/ee/dtr/admin/configure/external-storage/)
|
||||
- [Switch your storage backend](/ee/dtr/admin/configure/external-storage/storage-backend-migration/)
|
||||
- [Use NFS](/ee/dtr/admin/configure/external-storage/nfs/)
|
||||
- [Use S3](/ee/dtr/admin/configure/external-storage/s3/)
|
||||
- CLI reference pages
|
||||
- [docker/dtr install](/reference/dtr/2.6/cli/install/)
|
||||
- [docker/dtr reconfigure](/reference/dtr/2.6/cli/reconfigure/)
|
||||
- [docker/dtr restore](/reference/dtr/2.6/cli/restore/)
|
||||
|
|
@ -0,0 +1,84 @@
|
|||
---
|
||||
title: Back up Docker Swarm
|
||||
description: Learn how to create a backup of Docker Swarm
|
||||
keywords: enterprise, backup, swarm
|
||||
---
|
||||
|
||||
Docker manager nodes store the swarm state and manager logs in the `/var/lib/docker/swarm/` directory. Swarm raft logs contain crucial information for re-creating Swarm specific resources, including services, secrets, configurations and node cryptographic identity. In 1.13 and higher, this data includes the keys used to encrypt the raft logs. Without these keys, you cannot restore the swarm.
|
||||
|
||||
You must perform a manual backup on each manager node, because logs contain node IP address information and are not transferable to other nodes. If you do not backup the raft logs, you cannot verify workloads or Swarm resource provisioning after restoring the cluster.
|
||||
|
||||
> You can avoid performing Swarm backup by storing stacks, services definitions, secrets, and networks definitions in a *Source Code Management* or *Config Management* tool.
|
||||
|
||||
## Swarm backup contents
|
||||
|
||||
| Data | Description | Backed up |
|
||||
| :------------------|:-------------------------------------------------------------------------------------|:----------|
|
||||
| Raft keys | Used to encrypt communication among Swarm nodes and to encrypt and decrypt Raft logs | yes
|
||||
| Membership | List of the nodes in the cluster | yes
|
||||
| Services | Stacks and services stored in Swarm-mode | yes
|
||||
| Networks (overlay) | The overlay networks created on the cluster | yes
|
||||
| Configs | The configs created in the cluster | yes
|
||||
| Secrets | Secrets saved in the cluster | yes
|
||||
| Swarm unlock key | **Must be saved on a password manager !** | no
|
||||
|
||||
## Procedure
|
||||
|
||||
1. Retrieve your Swarm unlock key if `auto-lock` is enabled to be able
|
||||
to restore the swarm from backup. Retrieve the unlock key if necessary and
|
||||
store it in a safe location. If you are unsure, read
|
||||
[Lock your swarm to protect its encryption key](/engine/swarm/swarm_manager_locking.md).
|
||||
|
||||
2. Because you must stop the engine of the manager node before performing the backup, having three manager
|
||||
nodes is recommended for high availability [HA]). For a cluster to be operational, a majority of managers
|
||||
must be online. If less than 3 managers exists, the cluster is unavailable during the backup.
|
||||
|
||||
> **Note**: During the time that a manager is shut down, your swarm is more vulnerable to
|
||||
> losing the quorum if further nodes are lost. A loss of quorum means that the swarm is unavailabile
|
||||
> until quorum is recovered. Quorum is only recovered when more than 50% of the nodes are again available.
|
||||
> If you regularly take down managers to do backups, consider running a 5-manager swarm, so that you
|
||||
> can lose an additional manager while the backup is running without disrupting services.
|
||||
|
||||
3. Select a manager node. Try not to select the leader in order to avoid a new election inside the cluster:
|
||||
|
||||
```
|
||||
docker node ls -f "role=manager" | tail -n+2 | grep -vi leader
|
||||
```
|
||||
> Optional: Store the Docker version in a variable for easy addition to your backup name.
|
||||
|
||||
{% raw %}
|
||||
```
|
||||
ENGINE=$(docker version -f '{{.Server.Version}}')
|
||||
```
|
||||
{% endraw %}
|
||||
|
||||
4. Stop the Docker Engine on the manager before backing up the data, so that no data is changed during the backup:
|
||||
|
||||
```
|
||||
systemctl stop docker
|
||||
```
|
||||
|
||||
5. Back up the entire `/var/lib/docker/swarm` folder:
|
||||
|
||||
```
|
||||
tar cvzf "/tmp/swarm-${ENGINE}-$(hostname -s)-$(date +%s%z).tgz" /var/lib/docker/swarm/
|
||||
```
|
||||
|
||||
Note: _You can decode the Unix epoch in the filename by typing `date -d @timestamp`._ For example:
|
||||
|
||||
```
|
||||
date -d @1531166143
|
||||
Mon Jul 9 19:55:43 UTC 2018
|
||||
```
|
||||
|
||||
6. Restart the manager Docker Engine:
|
||||
|
||||
```
|
||||
systemctl start docker
|
||||
```
|
||||
7. Except for step 1, repeat the previous steps for each node.
|
||||
|
||||
### Where to go next
|
||||
|
||||
- [Back up UCP](back-up-ucp)
|
||||
|
|
@ -0,0 +1,244 @@
|
|||
---
|
||||
title: Back up UCP
|
||||
description: Learn how to create a backup of UCP
|
||||
keywords: enterprise, backup, ucp
|
||||
redirect_from:
|
||||
- /ee/ucp/admin/backups-and-disaster-recovery/
|
||||
---
|
||||
|
||||
UCP backups no longer require pausing the reconciler and deleting UCP containers, and backing up a UCP manager does not disrupt the manager’s activities.
|
||||
|
||||
Because UCP stores the same data on all manager nodes, you only need to back up a single UCP manager node.
|
||||
|
||||
User resources, such as services, containers, and stacks are not affected by this
|
||||
operation and continue operating as expected.
|
||||
|
||||
## Limitations
|
||||
|
||||
- Backups should not be utilized for restoring clusters on a cluster with a newer version of Docker Enterprise. For example, if backups occur on version N, then a restore on version N+1 is not supported.
|
||||
- More than one backup at the same time is not supported. If a backup is attempted while another backup is in progress, or if two backups are scheduled at the same time, a message is displayed to indicate that the second backup failed because another backup is running.
|
||||
- For crashed clusters, backup capability is not guaranteed. Perform regular backups to avoid this situation.
|
||||
- UCP backup does not include swarm workloads.
|
||||
|
||||
## UCP backup contents
|
||||
Backup contents are stored in a `.tar` file. Backups contain UCP configuration metadata to re-create configurations such as **Administration Settings** values such as LDAP and SAML, and RBAC configurations (Collections, Grants, Roles, User, and more):
|
||||
|
||||
| Data | Description | Backed up |
|
||||
| :---------------------|:-----------------------------------------------------------------------------------|:----------|
|
||||
| Configurations | UCP configurations, including Docker EE license. Swarm, and client CAs | yes
|
||||
| Access control | Permissions for teams to swarm resources, including collections, grants, and roles | yes
|
||||
| Certificates and keys | Certificates and public and private keys used for authentication and mutual TLS communication | yes
|
||||
| Metrics data | Monitoring data gathered by UCP | yes
|
||||
| Organizations | Users, teams, and organizations | yes
|
||||
| Volumes | All [UCP named volumes](/ee/ucp/ucp-architecture/#volumes-used-by-ucp/), including all UCP component certicates and data. [Learn more about UCP named volumes](/ee/ucp/ucp-architecture/). | yes
|
||||
| Overlay Networks | Swarm-mode overlay network definitions, including port information | no
|
||||
| Configs, Secrets | Create a Swarm backup to backup these data | no
|
||||
| Services | Stacks and services are stored in Swarm-mode or SCM/Config Management | no
|
||||
|
||||
**Note**: Because kube stores the state of resources on `etcd`, a backup of `etcd` is sufficient for stateless backups and is described [here](https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#backing-up-an-etcd-cluster).
|
||||
|
||||
## Data not included in the backup
|
||||
* `ucp-metrics-data`: holds the metrics server's data.
|
||||
* `ucp-node-certs` : holds certs used to lock down UCP system components
|
||||
* Routing mesh settings. Interlock L7 ingress configuration information is not captured in UCP backups. A manual backup and restore process is possible and should be performed.
|
||||
|
||||
## Kubernetes settings, data, and state
|
||||
|
||||
UCP backups include all kubernetes declarative objects (pods, deployments, replicasets, configs...), including secrets.
|
||||
|
||||
> **Note**: Kube volumes and kube node labels are not be backed up.
|
||||
Upon restore, kubernetes declarative objects are re-created. Containers are re-created and IPs are resolved.
|
||||
|
||||
For more information, see [Backing up an etcd cluster](https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#backing-up-an-etcd-cluster).
|
||||
|
||||
## Specify a backup file
|
||||
To avoid directly managing backup files, you can specify a file name and host directory on a secure and configured storage backend, such as NFS or another networked file system. The file system location is the backup folder on the manager node file system. This location must be writable by the `nobody` user, which is specified by changing the folder ownership to `nobody`. This operation requires administrator permissions to the manager node, and must only be run once for a given file system location.
|
||||
|
||||
```
|
||||
sudo chown nobody:nogroup /path/to/folder
|
||||
```
|
||||
> **Important**:
|
||||
- Specify a different name for each backup file. Otherwise, the existing backup file with the same name is overwritten.
|
||||
- Specify a location that is mounted on an fault-tolerant file system (such as NFS) rather than the node's local disk. Otherwise, it is important to regularly move backups from the manager node's local disk to ensure adequate space for ongoing backups.
|
||||
|
||||
## UCP backup steps
|
||||
There are several options for creating a UCP backup:
|
||||
|
||||
- [CLI](#create-a-ucp-backup-using-the-cli)
|
||||
- [UI](#create-a-ucp-backup-using-the-ui)
|
||||
- [API](#create-list-and-retrieve-ucp-backups-using-the-api)
|
||||
|
||||
The backup process runs on one manager node.
|
||||
|
||||
### Create a UCP backup using the CLI
|
||||
The following example shows how to create a UCP manager node backup, encrypt it by using a passphrase, decrypt it, verify its contents, and store it on `/securelocation/backup.tar`:
|
||||
|
||||
1. Run the `{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} backup` command on a single UCP manager and include the `--file` and `--include-logs`options. This creates a tar archive with the contents of all [volumes used by UCP](/ee/ucp-architecture/)
|
||||
and streams it to `stdout`. Replace `version` with the version you are currently running.
|
||||
|
||||
```bash
|
||||
docker container run \
|
||||
--log-driver none --rm \
|
||||
--interactive \
|
||||
--name ucp \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
-v /tmp:/backup \
|
||||
$ORG/ucp:$TAG backup \
|
||||
--file /securelocation/backup.tar \
|
||||
--passphrase "secret12chars" \
|
||||
--include-logs false
|
||||
```
|
||||
|
||||
> **Note**: If you are running with Security-Enhanced Linux (SELinux) enabled, which is typical for RHEL hosts, you must include `--security-opt label=disable` in the `docker` command (replace `version` with the version you are currently running):
|
||||
|
||||
```bash
|
||||
docker container run \
|
||||
--rm \
|
||||
--log-driver=none \
|
||||
--security-opt label=disable \
|
||||
--name ucp \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
docker/ucp:$version backup \
|
||||
--passphrase "secret12chars" > /securelocation/backup.tar
|
||||
```
|
||||
|
||||
> To determine whether SELinux is enabled in the engine, view the host’s `/etc/docker/daemon.json` file, and search for the string `"selinux-enabled":"true"`.
|
||||
|
||||
#### View log and progress information
|
||||
To view backup progress and error reporting, view the contents of the stderr streams of the running backup container during the backup. Progress is updated for each backup step, for example, after validation, after volumes are backed up, after `etcd` is backed up, and after `rethinkDB`. Progress is not preserved after the backup has completed.
|
||||
|
||||
#### Verify a UCP backup
|
||||
In a valid backup file, more than 100 files are displayed in the list and the `./ucp-node-certs/key.pem` file is present. Ensure the backup is a valid tar file by listing its contents, as shown in the following exampele:
|
||||
|
||||
```
|
||||
$ gpg --decrypt /securelocation/backup.tar | tar --list
|
||||
```
|
||||
|
||||
If decryption is not needed, you can list the contents by removing the `--decrypt flag`, as shown in the following example:
|
||||
|
||||
```
|
||||
$ tar --list -f /securelocation/backup.tar
|
||||
```
|
||||
|
||||
### Create a UCP backup using the UI
|
||||
|
||||
1. In the UCP UI, navigate to **Admin Settings**.
|
||||
2. Select **Backup Admin**.
|
||||
3. Select **Backup Now** to trigger an immediate backup.
|
||||
|
||||
The UI also provides the following options:
|
||||
- Display the status of a running backup
|
||||
- Display backup history
|
||||
- View backup contents
|
||||
|
||||
### Create, list, and retrieve UCP backups using the API
|
||||
|
||||
The UCP API provides three endpoints for managing UCP backups. You must be a UCP administrator to access these API endpoints.
|
||||
|
||||
#### Create a UCP backup using the API
|
||||
Create a backup with the `POST: /api/ucp/backup` endpoint. This is a json endpoint with the following arguments:
|
||||
|
||||
| field name | JSON data type* | description |
|
||||
|:----------: |:-------: |:----------------------------------------: |
|
||||
| passphrase | string | Encryption passphrase |
|
||||
| noPassphrase | bool | Set to `true` if not using a passphrase |
|
||||
| fileName | string | Backup file name |
|
||||
| includeLogs | bool | Specifies whether to include a log file |
|
||||
| hostPath | string | [File system location](#specify-a-backup-file) |
|
||||
|
||||
The request returns one of the following HTTP status codes, and, if successful, a backup ID.
|
||||
|
||||
- 200: Success
|
||||
- 500: Internal server error
|
||||
- 400: Malformed request (payload fails validation)
|
||||
|
||||
##### Example
|
||||
|
||||
```
|
||||
$ curl -sk -H 'Authorization: Bearer $AUTHTOKEN' https://$UCP_HOSTNAME/api/ucp/backup \
|
||||
-X POST \
|
||||
-H "Content-Type: application/json" \
|
||||
--data '{"encrypted": true, "includeLogs": true, "fileName": "backup1.tar", "logFileName": "backup1.log", "hostPath": "/secure-location"}'
|
||||
200 OK
|
||||
```
|
||||
|
||||
where:
|
||||
|
||||
- `$AUTHTOKEN` is your authentication bearer token if using using auth token identification.
|
||||
- `$UCP_HOSTNAME ` is your UCP hostname.
|
||||
|
||||
#### List all backups using the API
|
||||
|
||||
List existing backups with the `GET: /api/ucp/backups` endpoint. This request does not expect a payload and returns a list of backups, each as a JSON object following the schema found in the [Backup schema](#backup-schema) section.
|
||||
|
||||
The request returns one of the following HTTP status codes and, if successful, a list of existing backups:
|
||||
|
||||
- 200: Success
|
||||
- 500: Internal server error
|
||||
|
||||
##### Example
|
||||
|
||||
```
|
||||
curl -sk -H 'Authorization: Bearer $AUTHTOKEN' https://$UCP_HOSTNAME/api/ucp/backups
|
||||
[
|
||||
{
|
||||
"id": "0d0525dd-948a-41b4-9f25-c6b4cd6d9fe4",
|
||||
"encrypted": true,
|
||||
"fileName": "backup2.tar",
|
||||
"logFileName": "backup2.log",
|
||||
"backupPath": "/secure-location",
|
||||
"backupState": "SUCCESS",
|
||||
"nodeLocation": "ucp-node-ubuntu-0",
|
||||
"shortError": "",
|
||||
"created_at": "2019-04-10T21:55:53.775Z",
|
||||
"completed_at": "2019-04-10T21:56:01.184Z"
|
||||
},
|
||||
{
|
||||
"id": "2cf210df-d641-44ca-bc21-bda757c08d18",
|
||||
"encrypted": true,
|
||||
"fileName": "backup1.tar",
|
||||
"logFileName": "backup1.log",
|
||||
"backupPath": "/secure-location",
|
||||
"backupState": "IN_PROGRESS",
|
||||
"nodeLocation": "ucp-node-ubuntu-0",
|
||||
"shortError": "",
|
||||
"created_at": "2019-04-10T01:23:59.404Z",
|
||||
"completed_at": "0001-01-01T00:00:00Z"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
#### Retrieve backup details using the API
|
||||
|
||||
Retrieve details for a specific backup using the `GET: /api/ucp/backup/{backup_id}` endpoint, where `{backup_id}` is the ID of an existing backup. This request returns the backup, if it exists, for the specified ID, as a JSON object following the schema found in the [Backup schema](#backup-schema) section.
|
||||
|
||||
The request returns one of the following HTTP status codes, and if successful, the backup for the specified ID:
|
||||
|
||||
- 200: Success
|
||||
- 404: Backup not found for the given `{backup_id}`
|
||||
- 500: Internal server error
|
||||
|
||||
#### Backup schema
|
||||
|
||||
The following table describes the backup schema returned by the `GET` and `LIST` APIs:
|
||||
|
||||
| field name | JSON data type* | description |
|
||||
|:------------: |:---------------: |:-------------------------------------------------------------------: |
|
||||
| id | string | Unique ID |
|
||||
| encrypted | boolean | Set to `true` if encrypted with a passphrase |
|
||||
| fileName | string | Backup file name if backing up to a file, empty otherwise |
|
||||
| logFileName | string | Backup log file name if saving backup logs, empty otherwise |
|
||||
| backupPath | string | Host path where backup resides |
|
||||
| backupState | string | Current state of the backup (`IN_PROGRESS`, `SUCCESS`, `FAILED`) |
|
||||
| nodeLocation | string | Node on which the backup was taken |
|
||||
| shortError | string | Short error. Empty unless `backupState` is set to `FAILED` |
|
||||
| created_at | string | Time of backup creation |
|
||||
| completed_at | string | Time of backup completion |
|
||||
|
||||
> *: JSON data type as defined per [JSON RFC 7159](https://tools.ietf.org/html/rfc7159).
|
||||
|
||||
|
||||
### Where to go next
|
||||
|
||||
- [Back up DTR](back-up-dtr)
|
||||
|
|
@ -0,0 +1,37 @@
|
|||
---
|
||||
title: Back up Docker Enterprise
|
||||
description: Learn how to create a backup of your Docker Enterprise.
|
||||
keywords: enterprise, backup
|
||||
redirect_from:
|
||||
- /enterprise/backup/
|
||||
- /ee/backup/
|
||||
---
|
||||
|
||||
## Introduction
|
||||
This document provides instructions and best practices for Docker Enterprise backup procedures for all components of the platform.
|
||||
|
||||
> **Important**: Make sure you perform regular backups for Docker Enterprise, including prior to an upgrade or uninstallation.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Before performing a backup or restore operation for any component of Docker Enterprise, you must have healthy managers. Otherwise, disaster recovery procedures are involved.
|
||||
- Have adequate space available for backup contents.
|
||||
|
||||
## Procedure
|
||||
To back up Docker Enterprise, you must create individual backups
|
||||
for each of the following components:
|
||||
|
||||
1. [Back up Docker Swarm](back-up-swarm). Back up Swarm resources like service and network definitions.
|
||||
2. [Back up Universal Control Plane (UCP)](back-up-ucp). Back up UCP configurations.
|
||||
3. [Back up Docker Trusted Registry (DTR)](back-up-dtr). Back up DTR configurations, images, and metadata.
|
||||
|
||||
If you do not create backups for all components, you cannot restore your deployment to its previous state.
|
||||
|
||||
Test each backup you create. One way to test your backups is to do
|
||||
a fresh installation on a separate infrastructure with the backup. Refer to [Restore Docker Enterprise](/ee/admin/restore/) for additional information.
|
||||
|
||||
**Note**: Application data backup is **not** included in this information. Persistent storage data backup is the responsibility of the storage provider for the storage plugin or driver.
|
||||
|
||||
### Where to go next
|
||||
|
||||
- [Back up Docker Swarm](back-up-swarm)
|
|
@ -0,0 +1,384 @@
|
|||
---
|
||||
title: Disaster recovery
|
||||
description: Learn disaster recovery procedures for Docker Enterprise
|
||||
keywords: enterprise, recovery, disaster recovery, dtr, ucp, swarm
|
||||
redirect_from:
|
||||
- /ee/dtr/admin/disaster-recovery/
|
||||
- /ee/dtr/admin/disaster-recovery/repair-a-single-replica/
|
||||
- /ee/dtr/admin/disaster-recovery/repair-a-cluster/
|
||||
---
|
||||
|
||||
Disaster recovery procedures should be performed in the following order:
|
||||
|
||||
1. [Docker Swarm](#swarm-disaster-recovery).
|
||||
2. [Universal Control Plane (UCP)](#ucp-disaster-recovery).
|
||||
3. [Docker Trusted Registry (DTR)](#dtr-disaster-recovery).
|
||||
|
||||
## Swarm disaster recovery
|
||||
|
||||
### Recover from losing the quorum
|
||||
|
||||
Swarm is resilient to failures and the swarm can recover from any number
|
||||
of temporary node failures (machine reboots or crash with restart) or other
|
||||
transient errors. However, a swarm cannot automatically recover if it loses a
|
||||
quorum. Tasks on existing worker nodes continue to run, but administrative
|
||||
tasks are not possible, including scaling or updating services and joining or
|
||||
removing nodes from the swarm. The best way to recover is to bring the missing
|
||||
manager nodes back online. If that is not possible, continue reading for some
|
||||
options for recovering your swarm.
|
||||
|
||||
In a swarm of `N` managers, a quorum (a majority) of manager nodes must always
|
||||
be available. For example, in a swarm with 5 managers, a minimum of 3 must be
|
||||
operational and in communication with each other. In other words, the swarm can
|
||||
tolerate up to `(N-1)/2` permanent failures beyond which requests involving
|
||||
swarm management cannot be processed. These types of failures include data
|
||||
corruption or hardware failures.
|
||||
|
||||
If you lose the quorum of managers, you cannot administer the swarm. If you have
|
||||
lost the quorum and you attempt to perform any management operation on the swarm,
|
||||
an error occurs:
|
||||
|
||||
```none
|
||||
Error response from daemon: rpc error: code = 4 desc = context deadline exceeded
|
||||
```
|
||||
|
||||
The best way to recover from losing the quorum is to bring the failed nodes back
|
||||
online. If you can't do that, the only way to recover from this state is to use
|
||||
the `--force-new-cluster` action from a manager node. This removes all managers
|
||||
except the manager the command was run from. The quorum is achieved because
|
||||
there is now only one manager. Promote nodes to be managers until you have the
|
||||
desired number of managers.
|
||||
|
||||
```bash
|
||||
# From the node to recover
|
||||
docker swarm init --force-new-cluster --advertise-addr node01:2377
|
||||
```
|
||||
|
||||
When you run the `docker swarm init` command with the `--force-new-cluster`
|
||||
flag, the Docker Engine where you run the command becomes the manager node of a
|
||||
single-node swarm which is capable of managing and running services. The manager
|
||||
has all the previous information about services and tasks, worker nodes are
|
||||
still part of the swarm, and services are still running. You need to add or
|
||||
re-add manager nodes to achieve your previous task distribution and ensure that
|
||||
you have enough managers to maintain high availability and prevent losing the
|
||||
quorum.
|
||||
|
||||
### Force the swarm to rebalance
|
||||
|
||||
Generally, you do not need to force the swarm to rebalance its tasks. When you
|
||||
add a new node to a swarm, or a node reconnects to the swarm after a
|
||||
period of unavailability, the swarm does not automatically give a workload to
|
||||
the idle node. This is a design decision. If the swarm periodically shifted tasks
|
||||
to different nodes for the sake of balance, the clients using those tasks would
|
||||
be disrupted. The goal is to avoid disrupting running services for the sake of
|
||||
balance across the swarm. When new tasks start, or when a node with running
|
||||
tasks becomes unavailable, those tasks are given to less busy nodes. The goal
|
||||
is eventual balance, with minimal disruption to the end user.
|
||||
|
||||
In Docker 1.13 and higher, you can use the `--force` or `-f` flag with the
|
||||
`docker service update` command to force the service to redistribute its tasks
|
||||
across the available worker nodes. This causes the service tasks to restart.
|
||||
Client applications may be disrupted. If you have configured it, your service
|
||||
uses a [rolling update](/engine/swarm/swarm-tutorial/rolling-update/).
|
||||
|
||||
If you use an earlier version and you want to achieve an even balance of load
|
||||
across workers and don't mind disrupting running tasks, you can force your swarm
|
||||
to re-balance by temporarily scaling the service upward. Use
|
||||
`docker service inspect --pretty <servicename>` to see the configured scale
|
||||
of a service. When you use `docker service scale`, the nodes with the lowest
|
||||
number of tasks are targeted to receive the new workloads. There may be multiple
|
||||
under-loaded nodes in your swarm. You may need to scale the service up by modest
|
||||
increments a few times to achieve the balance you want across all the nodes.
|
||||
|
||||
When the load is balanced to your satisfaction, you can scale the service back
|
||||
down to the original scale. You can use `docker service ps` to assess the current
|
||||
balance of your service across nodes.
|
||||
|
||||
See also
|
||||
[`docker service scale`](/engine/reference/commandline/service_scale/) and
|
||||
[`docker service ps`](/engine/reference/commandline/service_ps/).
|
||||
|
||||
## UCP disaster recovery
|
||||
|
||||
In the event half or more manager nodes are lost and cannot be recovered
|
||||
to a healthy state, the system is considered to have lost quorum and can only be
|
||||
restored through the following disaster recovery procedure.
|
||||
|
||||
### Recover a UCP cluster from an existing backup
|
||||
|
||||
1. If UCP is still installed on the swarm, uninstall UCP using the `uninstall-ucp` command.
|
||||
> **Note**: If the restore is happening on new machines, skip this step.
|
||||
2. Perform a [restore from an existing backup](/ee/admin/restore/) on any node. If there is an
|
||||
existing swarm, the restore operation must be performed on a manager node. If no swarm exists,
|
||||
the restore operation will create one.
|
||||
|
||||
### Recover a UCP cluster without an existing backup (not recommended)
|
||||
If your cluster has lost quorum, you can still perform a backup of one of the remaining nodes.
|
||||
|
||||
> **Important**: Performing a backup after losing quorum is not guaranteed to succeed with
|
||||
no loss of running services or configuration data. To properly protect against
|
||||
manager failures, the system should be configured for
|
||||
[high availability](/ee/ucp/admin/configure/join-nodes/), and backups should be performed regularly
|
||||
in order to have complete backup data.
|
||||
|
||||
1. On one of the remaining manager nodes, perform `docker swarm init --force-new-cluster`. You might also need to specify an
|
||||
`--advertise-addr` parameter, which is equivalent to the `--host-address`
|
||||
parameter of the `docker/ucp install` operation. This instantiates a new
|
||||
single-manager swarm by recovering as much state as possible from the
|
||||
existing manager. This is a disruptive operation and existing tasks might be
|
||||
either terminated or suspended.
|
||||
2. [Create a backup](/ee/admin/backup/) of the remaining manager node.
|
||||
3. If UCP is still installed on the swarm, uninstall UCP using the
|
||||
`uninstall-ucp` command.
|
||||
4. Perform a [restore](/ee/admin/restore/) on the recovered swarm manager node.
|
||||
5. Log in to UCP and browse to the nodes page, or use the CLI `docker node ls`
|
||||
command.
|
||||
6. If any nodes are listed as `down`, you'll have to manually [remove these
|
||||
nodes](/ee/ucp/admin/configure/scale-your-cluster/) from the swarm and then re-join
|
||||
them using a `docker swarm join` operation with the swarm's new join-token.
|
||||
7. [Create a backup](/ee/admin/backup/) of the restored cluster.
|
||||
|
||||
### Recreate objects within Orchestrators that Docker Enterprise supports
|
||||
|
||||
Kubernetes currently backs up the declarative state of Kube objects in etcd. However, for Swarm, there is no way to take the state and export it to a declarative format, since the objects that are embedded within the Swarm raft logs are not easily transferable to other nodes or clusters.
|
||||
|
||||
For disaster recovery, to recreate swarm related workloads requires having the original scripts used for deployment. Alternatively, you can recreate workloads by manually recreating output from `docker inspect` commands.
|
||||
|
||||
## DTR disaster recovery
|
||||
|
||||
Docker Trusted Registry is a clustered application. You can join multiple
|
||||
replicas for high availability. For a DTR cluster to be healthy, a majority of its replicas (n/2 + 1) need to
|
||||
be healthy and be able to communicate with the other replicas. This is also
|
||||
known as maintaining quorum.
|
||||
|
||||
This means that there are three failure scenarios possible.
|
||||
|
||||
### Replica is unhealthy but cluster maintains quorum
|
||||
|
||||
One or more replicas are unhealthy, but the overall majority (n/2 + 1) is still
|
||||
healthy and able to communicate with one another.
|
||||
|
||||

|
||||
|
||||
In this example the DTR cluster has five replicas but one of the nodes stopped
|
||||
working, and the other has problems with the DTR overlay network.
|
||||
|
||||
Even though these two replicas are unhealthy the DTR cluster has a majority
|
||||
of replicas still working, which means that the cluster is healthy.
|
||||
|
||||
In this case you should repair the unhealthy replicas, or remove them from
|
||||
the cluster and join new ones.
|
||||
|
||||
#### Repair a single replica
|
||||
|
||||
When one or more DTR replicas are unhealthy but the overall majority
|
||||
(n/2 + 1) is healthy and able to communicate with one another, your DTR
|
||||
cluster is still functional and healthy.
|
||||
|
||||

|
||||
|
||||
Given that the DTR cluster is healthy, there's no need to execute any disaster
|
||||
recovery procedures like restoring from a backup.
|
||||
|
||||
Instead, you should:
|
||||
|
||||
1. Remove the unhealthy replicas from the DTR cluster.
|
||||
2. Join new replicas to make DTR highly available.
|
||||
|
||||
Since a DTR cluster requires a majority of replicas to be healthy at all times,
|
||||
the order of these operations is important. If you join more replicas before
|
||||
removing the ones that are unhealthy, your DTR cluster might become unhealthy.
|
||||
|
||||
##### Split-brain scenario
|
||||
|
||||
To understand why you should remove unhealthy replicas before joining new ones,
|
||||
imagine you have a five-replica DTR deployment, and something goes wrong with
|
||||
the overlay network connection the replicas, causing them to be separated in
|
||||
two groups.
|
||||
|
||||

|
||||
|
||||
Because the cluster originally had five replicas, it can work as long as
|
||||
three replicas are still healthy and able to communicate (5 / 2 + 1 = 3).
|
||||
Even though the network separated the replicas in two groups, DTR is still
|
||||
healthy.
|
||||
|
||||
If at this point you join a new replica instead of fixing the network problem
|
||||
or removing the two replicas that got isolated from the rest, it's possible
|
||||
that the new replica ends up in the side of the network partition that has
|
||||
less replicas.
|
||||
|
||||

|
||||
|
||||
When this happens, both groups now have the minimum amount of replicas needed
|
||||
to establish a cluster. This is also known as a split-brain scenario, because
|
||||
both groups can now accept writes and their histories start diverging, making
|
||||
the two groups effectively two different clusters.
|
||||
|
||||
##### Remove replicas
|
||||
|
||||
To remove unhealthy replicas, you'll first have to find the replica ID
|
||||
of one of the replicas you want to keep, and the replica IDs of the unhealthy
|
||||
replicas you want to remove.
|
||||
|
||||
You can find the list of replicas by navigating to **Shared Resources > Stacks** or **Swarm > Volumes** (when using [swarm mode](/engine/swarm/)) on the UCP web interface, or by using the UCP
|
||||
client bundle to run:
|
||||
|
||||
{% raw %}
|
||||
```bash
|
||||
docker ps --format "{{.Names}}" | grep dtr
|
||||
|
||||
# The list of DTR containers with <node>/<component>-<replicaID>, e.g.
|
||||
# node-1/dtr-api-a1640e1c15b6
|
||||
```
|
||||
{% endraw %}
|
||||
|
||||
Another way to determine the replica ID is to SSH into a DTR node and run the following:
|
||||
|
||||
{% raw %}
|
||||
```bash
|
||||
REPLICA_ID=$(docker inspect -f '{{.Name}}' $(docker ps -q -f name=dtr-rethink) | cut -f 3 -d '-')
|
||||
&& echo $REPLICA_ID
|
||||
```
|
||||
{% endraw %}
|
||||
|
||||
Then use the UCP client bundle to remove the unhealthy replicas:
|
||||
|
||||
```bash
|
||||
docker run -it --rm {{ page.dtr_org }}/{{ page.dtr_repo }}:{{ page.dtr_version }} remove \
|
||||
--existing-replica-id <healthy-replica-id> \
|
||||
--replica-ids <unhealthy-replica-id> \
|
||||
--ucp-insecure-tls \
|
||||
--ucp-url <ucp-url> \
|
||||
--ucp-username <user> \
|
||||
--ucp-password <password>
|
||||
```
|
||||
|
||||
You can remove more than one replica at the same time, by specifying multiple
|
||||
IDs with a comma.
|
||||
|
||||

|
||||
|
||||
##### Join replicas
|
||||
|
||||
Once you've removed the unhealthy nodes from the cluster, you should join new
|
||||
ones to make sure your cluster is highly available.
|
||||
|
||||
Use your UCP client bundle to run the following command which prompts you for
|
||||
the necessary parameters:
|
||||
|
||||
```bash
|
||||
docker run -it --rm \
|
||||
{{ page.dtr_org }}/{{ page.dtr_repo }}:{{ page.dtr_version }} join \
|
||||
--ucp-node <ucp-node-name> \
|
||||
--ucp-insecure-tls
|
||||
```
|
||||
|
||||
[Learn more about high availability](/ee/dtr/admin/configure/set-up-high-availability/).
|
||||
|
||||
### The majority of replicas are unhealthy
|
||||
|
||||
If a majority of replicas are unhealthy, making the cluster lose quorum, but at
|
||||
least one replica is still healthy, or at least the data volumes for DTR are
|
||||
accessible from that replica, you can repair the cluster without having to restore from
|
||||
a backup. This minimizes the amount of data loss. The following image provides an example of this scenario.
|
||||
|
||||

|
||||
|
||||
#### Repair a cluster
|
||||
|
||||
For a DTR cluster to be healthy, a majority of its replicas (n/2 + 1) need to
|
||||
be healthy and be able to communicate with the other replicas. This is known
|
||||
as maintaining quorum.
|
||||
|
||||
In a scenario where quorum is lost, but at least one replica is still
|
||||
accessible, you can use that replica to repair the cluster. That replica doesn't
|
||||
need to be completely healthy. The cluster can still be repaired as the DTR
|
||||
data volumes are persisted and accessible.
|
||||
|
||||

|
||||
|
||||
Repairing the cluster from an existing replica minimizes the amount of data lost.
|
||||
If this procedure doesn't work, you'll have to
|
||||
[restore from an existing backup](/ee/admin/restore/).
|
||||
|
||||
##### Diagnose an unhealthy cluster
|
||||
|
||||
When a majority of replicas are unhealthy, causing the overall DTR cluster to
|
||||
become unhealthy, operations like `docker login`, `docker pull`, and `docker push`
|
||||
present `internal server error`.
|
||||
|
||||
Accessing the `/_ping` endpoint of any replica also returns the same error.
|
||||
It's also possible that the DTR web UI is partially or fully unresponsive.
|
||||
|
||||
##### Perform an emergency repair
|
||||
|
||||
Use the `docker/dtr emergency-repair` command to try to repair an unhealthy
|
||||
DTR cluster, from an existing replica.
|
||||
|
||||
This command checks the data volumes for the DTR
|
||||
|
||||
This command checks the data volumes for the DTR replica are uncorrupted,
|
||||
redeploys all internal DTR components and reconfigured them to use the existing
|
||||
volumes.
|
||||
|
||||
It also reconfigures DTR removing all other nodes from the cluster, leaving DTR
|
||||
as a single-replica cluster with the replica you chose.
|
||||
|
||||
Start by finding the ID of the DTR replica that you want to repair from.
|
||||
You can find the list of replicas by navigating to **Shared Resources > Stacks** or **Swarm > Volumes** (when using [swarm mode](/engine/swarm/)) on the UCP web interface, or by using
|
||||
a UCP client bundle to run:
|
||||
|
||||
{% raw %}
|
||||
```bash
|
||||
docker ps --format "{{.Names}}" | grep dtr
|
||||
|
||||
# The list of DTR containers with <node>/<component>-<replicaID>, e.g.
|
||||
# node-1/dtr-api-a1640e1c15b6
|
||||
```
|
||||
{% endraw %}
|
||||
|
||||
Another way to determine the replica ID is to SSH into a DTR node and run the following:
|
||||
|
||||
{% raw %}
|
||||
```bash
|
||||
REPLICA_ID=$(docker inspect -f '{{.Name}}' $(docker ps -q -f name=dtr-rethink) | cut -f 3 -d '-')
|
||||
&& echo $REPLICA_ID
|
||||
```
|
||||
{% endraw %}
|
||||
|
||||
Then, use your UCP client bundle to run the emergency repair command:
|
||||
|
||||
```bash
|
||||
docker run -it --rm {{ page.dtr_org }}/{{ page.dtr_repo }}:{{ page.dtr_version }} emergency-repair \
|
||||
--ucp-insecure-tls \
|
||||
--existing-replica-id <replica-id>
|
||||
```
|
||||
|
||||
If the emergency repair procedure is successful, your DTR cluster now has a
|
||||
single replica. You should now
|
||||
[join more replicas for high availability](/ee/dtr/admin/configure/set-up-high-availability/).
|
||||
|
||||

|
||||
|
||||
If the emergency repair command fails, try running it again using a different
|
||||
replica ID. As a last resort, you can restore your cluster from an existing
|
||||
backup.
|
||||
|
||||
### All replicas are unhealthy
|
||||
|
||||
This is a total disaster scenario where all DTR replicas are lost, causing
|
||||
the data volumes for all DTR replicas to get corrupted or be lost.
|
||||
|
||||

|
||||
|
||||
In a disaster scenario like this, you'll have to restore DTR from an existing
|
||||
backup. Restoring from a backup should be only used as a last resort, since
|
||||
doing an emergency repair might prevent some data loss.
|
||||
|
||||
[Create a backup](/ee/admin/backup/).
|
||||
|
||||
## Where to go next
|
||||
|
||||
- [Create a backup](/ee/admin/backup/)
|
||||
- [Set up high availability](/ee/ucp/admin/configure/join-nodes/)
|
|
@ -0,0 +1,19 @@
|
|||
---
|
||||
title: Restore Docker Enterprise
|
||||
description: Learn how to restore Docker Enterprise platform from a backup.
|
||||
keywords: enterprise, restore, recovery
|
||||
---
|
||||
|
||||
You should only restore Docker Enterprise Edition from a backup as a last resort. If you're running Docker
|
||||
Enterprise Edition in high-availability mode, you can remove unhealthy nodes from the
|
||||
swarm and join new ones to bring the swarm to an healthy state.
|
||||
|
||||
Restore components individually and in the following order:
|
||||
|
||||
1. [Restore Docker Swarm](restore-swarm).
|
||||
2. [Restore Universal Control Plane (UCP)](restore-ucp).
|
||||
3. [Restore Docker Trusted Registry (DTR)](restore-dtr).
|
||||
|
||||
## Where to go next
|
||||
|
||||
- [Restore Docker Swarm](restore-swarm)
|
|
@ -0,0 +1,118 @@
|
|||
---
|
||||
title: Restore from a backup
|
||||
description: Learn how to restore a DTR cluster from an existing backup
|
||||
keywords: dtr, disaster recovery
|
||||
redirect_from:
|
||||
- /ee/dtr/admin/disaster-recovery/restore-from-backup/
|
||||
---
|
||||
|
||||
{% assign metadata_backup_file = "dtr-metadata-backup.tar" %}
|
||||
{% assign image_backup_file = "dtr-image-backup.tar" %}
|
||||
|
||||
## Restore DTR data
|
||||
|
||||
If your DTR has a majority of unhealthy replicas, the one way to restore it to
|
||||
a working state is by restoring from an existing backup.
|
||||
|
||||
To restore DTR, you need to:
|
||||
|
||||
1. Stop any DTR containers that might be running
|
||||
2. Restore the images from a backup
|
||||
3. Restore DTR metadata from a backup
|
||||
4. Re-fetch the vulnerability database
|
||||
|
||||
You need to restore DTR on the same UCP cluster where you've created the
|
||||
backup. If you restore on a different UCP cluster, all DTR resources will be
|
||||
owned by users that don't exist, so you'll not be able to manage the resources,
|
||||
even though they're stored in the DTR data store.
|
||||
|
||||
When restoring, you need to use the same version of the `docker/dtr` image
|
||||
that you've used when creating the update. Other versions are not guaranteed
|
||||
to work.
|
||||
|
||||
### Remove DTR containers
|
||||
|
||||
Start by removing any DTR container that is still running:
|
||||
|
||||
```none
|
||||
docker run -it --rm \
|
||||
{{ page.dtr_org }}/{{ page.dtr_repo }}:{{ page.dtr_version }} destroy \
|
||||
--ucp-insecure-tls
|
||||
```
|
||||
|
||||
### Restore images
|
||||
|
||||
If you had DTR configured to store images on the local filesystem, you can
|
||||
extract your backup:
|
||||
|
||||
```none
|
||||
sudo tar -xf {{ image_backup_file }} -C /var/lib/docker/volumes
|
||||
```
|
||||
|
||||
If you're using a different storage backend, follow the best practices
|
||||
recommended for that system.
|
||||
|
||||
### Restore DTR metadata
|
||||
|
||||
You can restore the DTR metadata with the `docker/dtr restore` command. This
|
||||
performs a fresh installation of DTR, and reconfigures it with
|
||||
the configuration created during a backup.
|
||||
|
||||
Load your UCP client bundle, and run the following command, replacing the
|
||||
placeholders for the real values:
|
||||
|
||||
```bash
|
||||
read -sp 'ucp password: ' UCP_PASSWORD;
|
||||
```
|
||||
|
||||
This prompts you for the UCP password. Next, run the following to restore DTR from your backup. You can learn more about the supported flags in [docker/dtr restore](/reference/dtr/2.6/cli/restore).
|
||||
|
||||
```bash
|
||||
docker run -i --rm \
|
||||
--env UCP_PASSWORD=$UCP_PASSWORD \
|
||||
{{ page.dtr_org }}/{{ page.dtr_repo }}:{{ page.dtr_version }} restore \
|
||||
--ucp-url <ucp-url> \
|
||||
--ucp-insecure-tls \
|
||||
--ucp-username <ucp-username> \
|
||||
--ucp-node <hostname> \
|
||||
--replica-id <replica-id> \
|
||||
--dtr-external-url <dtr-external-url> < {{ metadata_backup_file }}
|
||||
```
|
||||
|
||||
Where:
|
||||
|
||||
* `<ucp-url>` is the url you use to access UCP
|
||||
* `<ucp-username>` is the username of a UCP administrator
|
||||
* `<hostname>` is the hostname of the node where you've restored the images
|
||||
* `<replica-id>` the id of the replica you backed up
|
||||
* `<dtr-external-url>`the url that clients use to access DTR
|
||||
|
||||
#### DTR 2.5 and below
|
||||
|
||||
If you're using NFS as a storage backend, also include `--nfs-storage-url` as
|
||||
part of your restore command, otherwise DTR is restored but starts using a
|
||||
local volume to persist your Docker images.
|
||||
|
||||
#### DTR 2.5 (with experimental online garbage collection) and DTR 2.6.0-2.6.3
|
||||
|
||||
> When running DTR 2.5 (with experimental online garbage collection) and 2.6.0 to 2.6.3, there is an issue with
|
||||
> [reconfiguring and restoring DTR with `--nfs-storage-url`](/ee/dtr/release-notes#version-26) which leads to
|
||||
> erased tags. Make sure to [back up your DTR metadata](/ee/dtr/admin/disaster-recovery/create-a-backup/#back-up-dtr-metadata)
|
||||
> before you proceed. To work around the `--nfs-storage-url`flag issue, manually create a storage volume on each DTR node.
|
||||
> To [restore DTR](/reference/dtr/2.6/cli/restore/) from an existing backup, use `docker/dtr restore`
|
||||
> with `--dtr-storage-volume` and the new volume.
|
||||
> See [Restore to a Local NFS Volume]( https://success.docker.com/article/dtr-26-lost-tags-after-reconfiguring-storage#restoretoalocalnfsvolume)
|
||||
> for Docker's recommended recovery strategy.
|
||||
{: .info}
|
||||
|
||||
### Re-fetch the vulnerability database
|
||||
|
||||
If you're scanning images, you now need to download the vulnerability database.
|
||||
|
||||
After you successfully restore DTR, you can join new replicas the same way you
|
||||
would after a fresh installation. [Learn more](/ee/dtr/admin/configure/set-up-vulnerability-scans.md).
|
||||
|
||||
## Where to go next
|
||||
|
||||
- [docker/dtr restore](/reference/dtr/2.6/cli/restore/)
|
||||
|
|
@ -0,0 +1,53 @@
|
|||
---
|
||||
title: Restore Docker Swarm
|
||||
description: Learn how to restore Docker Swarm from an existing backup
|
||||
keywords: enterprise, restore, swarm
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- You must use the same IP as the node from which you made the backup. The command to force the new cluster does not reset the IP in the Swarm data.
|
||||
- You must restore the backup on the same Docker Engine version.
|
||||
- You can find the list of manager IP addresses in `state.json` in the zip file.
|
||||
- If `auto-lock` was enabled on the old Swarm, the unlock key is required to perform the restore.
|
||||
|
||||
## Perform Swarm restore
|
||||
Use the following procedure on each manager node to restore data to a new swarm.
|
||||
|
||||
1. Shut down the Docker Engine on the node you select for the restore:
|
||||
|
||||
```
|
||||
systemctl stop docker
|
||||
```
|
||||
2. Remove the contents of the `/var/lib/docker/swarm` directory on the new Swarm if it exists.
|
||||
3. Restore the `/var/lib/docker/swarm` directory with the contents of the backup.
|
||||
|
||||
> **Note**: The new node uses the same encryption key for on-disk
|
||||
> storage as the old one. It is not possible to change the on-disk storage
|
||||
> encryption keys at this time. In the case of a swarm with auto-lock enabled,
|
||||
> the unlock key is also the same as on the old swarm, and the unlock key is
|
||||
> needed to restore the swarm.
|
||||
|
||||
4. Start Docker on the new node. Unlock the swarm if necessary.
|
||||
|
||||
```
|
||||
systemctl start docker
|
||||
```
|
||||
5. Re-initialize the swarm so that the node does not attempt to connect to nodes that were part of the old swarm, and presumably no longer exist:
|
||||
|
||||
```
|
||||
$ docker swarm init --force-new-cluster
|
||||
```
|
||||
|
||||
6. Verify that the state of the swarm is as expected. This may include
|
||||
application-specific tests or simply checking the output of
|
||||
`docker service ls` to be sure that all expected services are present.
|
||||
|
||||
7. If you use auto-lock,
|
||||
[rotate the unlock key](/engine/swarm/swarm_manager_locking.md#rotate-the-unlock-key).
|
||||
8. Add the manager and worker nodes to the new swarm.
|
||||
9. Reinstate your previous backup regimen on the new swarm.
|
||||
|
||||
### Where to go next
|
||||
|
||||
- [Restore UCP](restore-ucp)
|
|
@ -0,0 +1,98 @@
|
|||
---
|
||||
title: Restore UCP
|
||||
description: Learn how to restore UCP from a backup
|
||||
keywords: enterprise, restore, swarm
|
||||
---
|
||||
|
||||
To restore UCP, select one of the following options:
|
||||
|
||||
* Run the restore on the machines from which the backup originated or on new machines. You can use the same swarm from which the backup originated or a new swarm.
|
||||
* On a manager node of an existing swarm that does not have UCP installed.
|
||||
In this case, UCP restore will use the existing swarm. and runs instead of any install.
|
||||
* Run the restore on a docker engine that isn't participating in a swarm, in which case it performs `docker swarm init` in the same way as the install operation would. A new swarm is created and UCP is restored on top.
|
||||
|
||||
## Limitations
|
||||
|
||||
- To restore an existing UCP installation from a backup, you need to
|
||||
uninstall UCP from the swarm by using the `uninstall-ucp` command.
|
||||
[Learn to uninstall UCP](/ee/ucp/admin/install/uninstall/).
|
||||
- Restore operations must run using the same major/minor UCP version (and `docker/ucp` image version) as the backed up cluster. Restoring to a later patch release version is allowed.
|
||||
- If you restore UCP using a different Docker swarm than the one where UCP was
|
||||
previously deployed on, UCP will start using new TLS certificates. Existing
|
||||
client bundles won't work anymore, so you must download new ones.
|
||||
|
||||
## Kubernetes settings, data, and state
|
||||
During the UCP restore, Kubernetes declarative objects are re-created, containers are re-created, and IPs are resolved.
|
||||
|
||||
For more information, see [Restoring an etcd cluster](https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#restoring-an-etcd-cluster).
|
||||
|
||||
## Perform UCP restore
|
||||
|
||||
When the restore operations starts, it looks for the UCP version used in the backup and performs one of the following actions:
|
||||
|
||||
- Fails if the restore operation is running using an image that does not match the UCP version from the backup (a `--force` flag is available to override this if necessary)
|
||||
- Provides instructions how to run the restore process using the matching UCP version from the backup
|
||||
|
||||
Volumes are placed onto the host on which the UCP restore command occurs.
|
||||
|
||||
The following example shows how to restore UCP from an existing backup file, presumed to be located at `/tmp/backup.tar` (replace `<UCP_VERSION>` with the version of your backup):
|
||||
|
||||
```
|
||||
$ docker container run --rm -i --name ucp \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
docker/ucp:<UCP_VERSION> restore < /tmp/backup.tar
|
||||
```
|
||||
|
||||
If the backup file is encrypted with a passphrase, provide the passphrase to the restore operation(replace `<UCP_VERSION>` with the version of your backup):
|
||||
|
||||
```
|
||||
$ docker container run --rm -i --name ucp \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
docker/ucp:<UCP_VERSION> restore --passphrase "secret" < /tmp/backup.tar
|
||||
```
|
||||
|
||||
The restore command may also be invoked in interactive mode, in which case the backup file should be mounted to the container rather than streamed through stdin (replace `<UCP_VERSION>` with the version of your backup):
|
||||
|
||||
```
|
||||
$ docker container run --rm -i --name ucp \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
-v /tmp/backup.tar:/config/backup.tar \
|
||||
docker/ucp:<UCP_VERSION> restore -i
|
||||
```
|
||||
|
||||
The restore command can also be invoked in interactive mode, in which case the
|
||||
backup file should be mounted to the container rather than streamed through
|
||||
`stdin`:
|
||||
|
||||
```none
|
||||
docker container run --rm -i --name ucp \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
-v /tmp/backup.tar:/config/backup.tar \
|
||||
{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} restore -i
|
||||
```
|
||||
|
||||
## Regenerate Certs
|
||||
The current certs volume contain cluster specific information (such as SANs) is invalid on new clusters with different IPs. For volumes that are not backed up (`ucp-node-certs`, for example), the restore regenerates certs. For certs that are backed up, (ucp-controller-server-certs), the restore does not perform a regeneration and you m ust correct those certs when the restore completes.
|
||||
|
||||
After you successfully restore UCP, you can add new managers and workers the same way you would after a fresh installation.
|
||||
|
||||
## Restore operation status
|
||||
For restore operations, view the standard streams of the UCP bootstrap container.
|
||||
|
||||
## Verify the UCP restore
|
||||
A successful UCP restore involves verifying the following items:
|
||||
|
||||
- All swarm managers are healthy after running the following command:
|
||||
|
||||
```
|
||||
"curl -s -k https://localhost/_ping".
|
||||
```
|
||||
**Note**: Monitor all swarm managers for at least 15 minutes to ensure no degradation.
|
||||
- No containers on swarm managers are marked as "unhealthy".
|
||||
- All swarm managers and nodes are running containers with the new version.
|
||||
- No swarm managers or nodes are running containers with the old version, except for Kubernetes Pods that use the "ucp-pause" image.
|
||||
|
||||
|
||||
### Where to go next
|
||||
|
||||
- [Restore DTR](restore-dtr)
|
39
ee/backup.md
39
ee/backup.md
|
@ -1,39 +0,0 @@
|
|||
---
|
||||
title: Backup Docker EE
|
||||
description: Learn how to create a backup of your Docker Enterprise Edition, and how to restore from a backup.
|
||||
keywords: enterprise, backup, restore
|
||||
redirect_from:
|
||||
- /enterprise/backup/
|
||||
---
|
||||
|
||||
To backup Docker Enterprise Edition you need to create individual backups
|
||||
for each of the following components:
|
||||
|
||||
1. Docker Swarm. [Backup Swarm resources like service and network definitions](/engine/swarm/admin_guide.md#back-up-the-swarm).
|
||||
2. Universal Control Plane (UCP). [Backup UCP configurations](/ee/ucp/admin/backups-and-disaster-recovery.md).
|
||||
3. Docker Trusted Registry (DTR). [Backup DTR configurations and images](/ee/dtr/admin/disaster-recovery/create-a-backup.md).
|
||||
|
||||
Before proceeding to backup the next component, you should test the backup you've
|
||||
created to make sure it's not corrupt. One way to test your backups is to do
|
||||
a fresh installation in a separate infrastructure and restore the new installation
|
||||
using the backup you've created.
|
||||
|
||||
If you create backups for a single component, you can't restore your
|
||||
deployment to its previous state.
|
||||
|
||||
## Restore Docker Enterprise Edition
|
||||
|
||||
You should only restore from a backup as a last resort. If you're running Docker
|
||||
Enterprise Edition in high-availability you can remove unhealthy nodes from the
|
||||
swarm and join new ones to bring the swarm to an healthy state.
|
||||
|
||||
To restore Docker Enterprise Edition, you need to restore the individual
|
||||
components one by one:
|
||||
|
||||
1. Docker Swarm. [Learn more](/engine/swarm/admin_guide.md#recover-from-disaster).
|
||||
2. Universal Control Plane (UCP). [Learn more](/ee/ucp/admin/backups-and-disaster-recovery.md#restore-your-swarm).
|
||||
3. Docker Trusted Registry (DTR). [Learn more](/ee/dtr/admin/disaster-recovery/restore-from-backup.md).
|
||||
|
||||
## Where to go next
|
||||
|
||||
- [Upgrade Docker EE](upgrade.md)
|
Loading…
Reference in New Issue