mirror of https://github.com/docker/docs.git
Port monitoring topics to template (#388)
This commit is contained in:
parent
e37b7ccc6a
commit
1b9d4807b7
|
|
@ -1,13 +1,24 @@
|
|||
---
|
||||
title: Backups and disaster recovery
|
||||
description: Learn how to backup your Docker Universal Control Plane swarm, and
|
||||
to recover your swarm from an existing backup.
|
||||
keywords: ucp, backup, restore, recovery
|
||||
title: Backups and disaster recovery
|
||||
ui_tabs:
|
||||
- version: ucp-3.0
|
||||
orhigher: false
|
||||
- version: ucp-2.2
|
||||
orlower: true
|
||||
next_steps:
|
||||
- path: configure/join-nodes/
|
||||
title: Set up high availability
|
||||
- path: ../ucp-architecture/
|
||||
title: UCP architecture
|
||||
---
|
||||
{% if include.version=="ucp-3.0" %}
|
||||
|
||||
When you decide to start using Docker Universal Control Plane on a production
|
||||
setting, you should
|
||||
[configure it for high availability](configure/set-up-high-availability.md).
|
||||
[configure it for high availability](configure/join-nodes/index.md).
|
||||
|
||||
The next step is creating a backup policy and disaster recovery plan.
|
||||
|
||||
|
|
@ -25,7 +36,7 @@ UCP maintains data about:
|
|||
| Volumes | All [UCP named volumes](../architecture/#volumes-used-by-ucp), which include all UCP component certs and data |
|
||||
|
||||
This data is persisted on the host running UCP, using named volumes.
|
||||
[Learn more about UCP named volumes](../architecture.md).
|
||||
[Learn more about UCP named volumes](../ucp-architecture.md).
|
||||
|
||||
## Backup steps
|
||||
|
||||
|
|
@ -33,18 +44,18 @@ Back up your Docker EE components in the following order:
|
|||
|
||||
1. [Back up your swarm](/engine/swarm/admin_guide/#back-up-the-swarm)
|
||||
2. Back up UCP
|
||||
3. [Back up DTR](../../../../dtr/2.3/guides/admin/backups-and-disaster-recovery.md)
|
||||
3. [Back up DTR](../../../../dtr/2.5/guides/admin/backups-and-disaster-recovery.md)
|
||||
|
||||
## Backup policy
|
||||
|
||||
As part of your backup policy you should regularly create backups of UCP.
|
||||
DTR is backed up independently.
|
||||
[Learn about DTR backups and recovery](../../../../dtr/2.3/guides/admin/backups-and-disaster-recovery.md).
|
||||
[Learn about DTR backups and recovery](../../../../dtr/2.5/guides/admin/backups-and-disaster-recovery.md).
|
||||
|
||||
To create a UCP backup, run the `{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} backup` command
|
||||
on a single UCP manager. This command creates a tar archive with the
|
||||
contents of all the [volumes used by UCP](../architecture.md) to persist data
|
||||
and streams it to stdout. The backup doesn't include the swarm-mode state,
|
||||
contents of all the [volumes used by UCP](../ucp-architecture.md) to persist data
|
||||
and streams it to `stdout`. The backup doesn't include the swarm-mode state,
|
||||
like service definitions and overlay network definitions.
|
||||
|
||||
You only need to run the backup command on a single UCP manager node. Since UCP
|
||||
|
|
@ -66,7 +77,7 @@ temporarily unable to:
|
|||
|
||||
To minimize the impact of the backup policy on your business, you should:
|
||||
|
||||
* Configure UCP for [high availability](configure/set-up-high-availability.md).
|
||||
* Configure UCP for [high availability](configure/join-nodes/index.md).
|
||||
This allows load-balancing user requests across multiple UCP manager nodes.
|
||||
* Schedule the backup to take place outside business hours.
|
||||
|
||||
|
|
@ -77,14 +88,14 @@ verify its contents:
|
|||
|
||||
```none
|
||||
# Create a backup, encrypt it, and store it on /tmp/backup.tar
|
||||
$ docker container run --log-driver none --rm -i --name ucp \
|
||||
docker container run --log-driver none --rm -i --name ucp \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} backup --interactive > /tmp/backup.tar
|
||||
|
||||
# Ensure the backup is a valid tar and list its contents
|
||||
# In a valid backup file, over 100 files should appear in the list
|
||||
# and the `./ucp-node-certs/key.pem` file should be present
|
||||
$ tar --list -f /tmp/backup.tar
|
||||
tar --list -f /tmp/backup.tar
|
||||
```
|
||||
|
||||
A backup file may optionally be encrypted using a passphrase, as in the
|
||||
|
|
@ -92,13 +103,13 @@ following example:
|
|||
|
||||
```none
|
||||
# Create a backup, encrypt it, and store it on /tmp/backup.tar
|
||||
$ docker container run --log-driver none --rm -i --name ucp \
|
||||
docker container run --log-driver none --rm -i --name ucp \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} backup --interactive \
|
||||
--passphrase "secret" > /tmp/backup.tar
|
||||
|
||||
# Decrypt the backup and list its contents
|
||||
$ gpg --decrypt /tmp/backup.tar | tar --list
|
||||
gpg --decrypt /tmp/backup.tar | tar --list
|
||||
```
|
||||
|
||||
### Security-Enhanced Linux (SELinux)
|
||||
|
|
@ -108,7 +119,7 @@ which is typical for RHEL hosts, you need to include `--security-opt label=disab
|
|||
in the `docker` command:
|
||||
|
||||
```bash
|
||||
$ docker container run --security-opt label=disable --log-driver none --rm -i --name ucp \
|
||||
docker container run --security-opt label=disable --log-driver none --rm -i --name ucp \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} backup --interactive > /tmp/backup.tar
|
||||
```
|
||||
|
|
@ -129,7 +140,7 @@ UCP from an existing backup file, presumed to be located at
|
|||
`/tmp/backup.tar`:
|
||||
|
||||
```none
|
||||
$ docker container run --rm -i --name ucp \
|
||||
docker container run --rm -i --name ucp \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} restore < /tmp/backup.tar
|
||||
```
|
||||
|
|
@ -138,17 +149,17 @@ If the backup file is encrypted with a passphrase, you will need to provide the
|
|||
passphrase to the restore operation:
|
||||
|
||||
```none
|
||||
$ docker container run --rm -i --name ucp \
|
||||
docker container run --rm -i --name ucp \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} restore --passphrase "secret" < /tmp/backup.tar
|
||||
```
|
||||
|
||||
The restore command may also be invoked in interactive mode, in which case the
|
||||
backup file should be mounted to the container rather than streamed through
|
||||
stdin:
|
||||
`stdin`:
|
||||
|
||||
```none
|
||||
$ docker container run --rm -i --name ucp \
|
||||
docker container run --rm -i --name ucp \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
-v /tmp/backup.tar:/config/backup.tar \
|
||||
{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} restore -i
|
||||
|
|
@ -164,7 +175,7 @@ UCP restore recovers the following assets from the backup file:
|
|||
authentication backends.
|
||||
|
||||
UCP restore does not include swarm assets such as cluster membership, services, networks,
|
||||
secrets, etc. [Learn to backup a swarm](https://docs.docker.com/engine/swarm/admin_guide/#back-up-the-swarm).
|
||||
secrets, etc. [Learn to backup a swarm](/engine/swarm/admin_guide/#back-up-the-swarm).
|
||||
|
||||
There are two ways to restore UCP:
|
||||
|
||||
|
|
@ -184,7 +195,7 @@ recommend making backups regularly.
|
|||
It is important to note that this procedure is not guaranteed to succeed with
|
||||
no loss of running services or configuration data. To properly protect against
|
||||
manager failures, the system should be configured for
|
||||
[high availability](configure/set-up-high-availability.md).
|
||||
[high availability](configure/join-nodes/index.md).
|
||||
|
||||
1. On one of the remaining manager nodes, perform `docker swarm init
|
||||
--force-new-cluster`. You may also need to specify an
|
||||
|
|
@ -201,10 +212,11 @@ manager failures, the system should be configured for
|
|||
5. Log in to UCP and browse to the nodes page, or use the CLI `docker node ls`
|
||||
command.
|
||||
6. If any nodes are listed as `down`, you'll have to manually [remove these
|
||||
nodes](../configure/scale-your-cluster.md) from the swarm and then re-join
|
||||
nodes](configure/scale-your-cluster.md) from the swarm and then re-join
|
||||
them using a `docker swarm join` operation with the swarm's new join-token.
|
||||
|
||||
## Where to go next
|
||||
{% elsif include.version=="ucp-2.2" %}
|
||||
|
||||
* [Set up high availability](configure/set-up-high-availability.md)
|
||||
* [UCP architecture](../architecture.md)
|
||||
Learn about [backups and disaster recovery](/datacenter/ucp/2.2/guides/admin/backups-and-disaster-recovery.md).
|
||||
|
||||
{% endif %}
|
||||
|
|
|
|||
|
|
@ -126,6 +126,10 @@ If you're load-balancing user requests to UCP across multiple manager nodes,
|
|||
when demoting those nodes into workers, don't forget to remove them from your
|
||||
load-balancing pool.
|
||||
|
||||
{% elsif include.version=="ucp-2.2" %}
|
||||
|
||||
Learn about [scaling your cluster](/datacenter/ucp/2.2/guides/admin/configure/scale-your-cluster.md).
|
||||
|
||||
{% endif %}
|
||||
{% endif %}
|
||||
|
||||
|
|
@ -171,10 +175,5 @@ To remove the node, use:
|
|||
docker node rm <node-hostname>
|
||||
```
|
||||
|
||||
## Where to go next
|
||||
|
||||
* [Use your own TLS certificates](use-your-own-tls-certificates.md)
|
||||
* [Set up high availability](join-nodes/index.md)
|
||||
|
||||
{% endif %}
|
||||
{% endif %}
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@ description: Learn how to deploy Docker Universal Control Plane using images tha
|
|||
keywords: UCP, Docker EE, image, IBM z, Windows
|
||||
ui_tabs:
|
||||
- version: ucp-3.0
|
||||
orhigher: true
|
||||
orhigher: false
|
||||
- version: ucp-2.2
|
||||
orlower: true
|
||||
next_steps:
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@ description: Learn how to install Docker Universal Control Plane on production.
|
|||
keywords: Universal Control Plane, UCP, install, Docker EE
|
||||
ui_tabs:
|
||||
- version: ucp-3.0
|
||||
orhigher: true
|
||||
orhigher: false
|
||||
- version: ucp-2.2
|
||||
orlower: true
|
||||
next_steps:
|
||||
|
|
|
|||
|
|
@ -5,7 +5,7 @@ description: Learn how to install Docker Universal Control Plane. on a machine w
|
|||
keywords: UCP, install, offline, Docker EE
|
||||
ui_tabs:
|
||||
- version: ucp-3.0
|
||||
orhigher: true
|
||||
orhigher: false
|
||||
- version: ucp-2.2
|
||||
orlower: true
|
||||
next_steps:
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@ description: Learn about the Docker Universal Control Plane architecture, and th
|
|||
keywords: UCP, install, Docker EE
|
||||
ui_tabs:
|
||||
- version: ucp-3.0
|
||||
orhigher: true
|
||||
orhigher: false
|
||||
- version: ucp-2.2
|
||||
orlower: true
|
||||
next_steps:
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@ description: Learn about the system requirements for installing Docker Universal
|
|||
keywords: UCP, architecture, requirements, Docker EE
|
||||
ui_tabs:
|
||||
- version: ucp-3.0
|
||||
orhigher: true
|
||||
orhigher: false
|
||||
- version: ucp-2.2
|
||||
orlower: true
|
||||
next_steps:
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@ description: Learn how to uninstall a Docker Universal Control Plane swarm.
|
|||
keywords: UCP, uninstall, install, Docker EE
|
||||
ui_tabs:
|
||||
- version: ucp-3.0
|
||||
orhigher: true
|
||||
orhigher: false
|
||||
- version: ucp-2.2
|
||||
orlower: true
|
||||
next_steps:
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@ description: Learn how to upgrade Docker Universal Control Plane on a machine wi
|
|||
keywords: ucp, upgrade, offline
|
||||
ui_tabs:
|
||||
- version: ucp-3.0
|
||||
orhigher: true
|
||||
orhigher: false
|
||||
- version: ucp-2.2
|
||||
orlower: true
|
||||
next_steps:
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@ description: Learn how to upgrade Docker Universal Control Plane with minimal im
|
|||
keywords: UCP, upgrade, update
|
||||
ui_tabs:
|
||||
- version: ucp-3.0
|
||||
orhigher: true
|
||||
orhigher: false
|
||||
- version: ucp-2.2
|
||||
orlower: true
|
||||
next_steps:
|
||||
|
|
|
|||
|
|
@ -2,13 +2,26 @@
|
|||
title: Monitor the cluster status
|
||||
description: Monitor your Docker Universal Control Plane installation, and learn how to troubleshoot it.
|
||||
keywords: UCP, troubleshoot, health, cluster
|
||||
ui_tabs:
|
||||
- version: ucp-3.0
|
||||
orhigher: false
|
||||
- version: ucp-2.2
|
||||
orlower: true
|
||||
cli_tabs:
|
||||
- version: docker-cli-linux
|
||||
next_steps:
|
||||
- path: troubleshoot-with-logs/
|
||||
title: Troubleshoot with logs
|
||||
- path: troubleshoot-node-messages/
|
||||
title: Troubleshoot node states
|
||||
---
|
||||
{% if include.ui %}
|
||||
|
||||
{% if include.version=="ucp-3.0" %}
|
||||
|
||||
You can monitor the status of UCP by using the web UI or the CLI.
|
||||
You can also use the `_ping` endpoint to build monitoring automation.
|
||||
|
||||
## Check status from the UI
|
||||
|
||||
The first place to check the status of UCP is the UCP web UI, since it
|
||||
shows warnings for situations that require your immediate attention.
|
||||
Administrators might see more warnings than regular users.
|
||||
|
|
@ -27,22 +40,29 @@ Click the node to get more info on its status. In the details pane, click
|
|||
**Actions** and select **Agent logs** to see the log entries from the
|
||||
node.
|
||||
|
||||
{% elsif include.version=="ucp-2.2" %}
|
||||
|
||||
## Check status from the CLI
|
||||
Learn how to [monitor the cluster status](/datacenter/ucp/2.2/guides/admin/monitor-and-troubleshoot/index.md).
|
||||
|
||||
{% endif %}
|
||||
{% endif %}
|
||||
|
||||
{% if include.cli %}
|
||||
|
||||
{% if include.version=="docker-cli-linux" %}
|
||||
|
||||
You can also monitor the status of a UCP cluster using the Docker CLI client.
|
||||
Download [a UCP client certificate bundle](../../user/access-ucp/cli-based-access.md)
|
||||
and then run:
|
||||
|
||||
```none
|
||||
$ docker node ls
|
||||
```bash
|
||||
docker node ls
|
||||
```
|
||||
|
||||
As a rule of thumb, if the status message starts with `[Pending]`, then the
|
||||
current state is transient and the node is expected to correct itself back
|
||||
into a healthy state. [Learn more about node status](troubleshoot-node-messages.md).
|
||||
|
||||
|
||||
## Monitoring automation
|
||||
|
||||
You can use the `https://<ucp-manager-url>/_ping` endpoint to check the health
|
||||
|
|
@ -64,9 +84,5 @@ URL of a manager node, and not a load balancer. In addition, please be aware tha
|
|||
pinging the endpoint with HEAD will result in a 404 error code. It is better to
|
||||
use GET instead.
|
||||
|
||||
|
||||
|
||||
## Where to go next
|
||||
|
||||
* [Troubleshoot with logs](troubleshoot-with-logs.md)
|
||||
* [Troubleshoot node states](./troubleshoot-node-messages.md)
|
||||
{% endif %}
|
||||
{% endif %}
|
||||
|
|
|
|||
|
|
@ -2,7 +2,16 @@
|
|||
title: Troubleshoot cluster configurations
|
||||
description: Learn how to troubleshoot your Docker Universal Control Plane cluster.
|
||||
keywords: troubleshoot, etcd, rethinkdb, key, value, store, database, ucp, health, cluster
|
||||
ui_tabs:
|
||||
- version: ucp-3.0
|
||||
orhigher: false
|
||||
- version: ucp-2.2
|
||||
orlower: true
|
||||
next_steps:
|
||||
- path: ../../get-support/
|
||||
title: Get support
|
||||
---
|
||||
{% if include.version=="ucp-3.0" %}
|
||||
|
||||
UCP automatically tries to heal itself by monitoring its internal
|
||||
components and trying to bring them to a healthy state.
|
||||
|
|
@ -27,7 +36,7 @@ store REST API, and `jq` to process the responses.
|
|||
You can install these tools on a Ubuntu distribution by running:
|
||||
|
||||
```bash
|
||||
$ sudo apt-get update && apt-get install curl jq
|
||||
sudo apt-get update && sudo apt-get install curl jq
|
||||
```
|
||||
|
||||
1. Use a client bundle to authenticate your requests.
|
||||
|
|
@ -38,9 +47,9 @@ $ sudo apt-get update && apt-get install curl jq
|
|||
bundle.
|
||||
|
||||
```bash
|
||||
$ export KV_URL="https://$(echo $DOCKER_HOST | cut -f3 -d/ | cut -f1 -d:):12379"
|
||||
export KV_URL="https://$(echo $DOCKER_HOST | cut -f3 -d/ | cut -f1 -d:):12379"
|
||||
|
||||
$ curl -s \
|
||||
curl -s \
|
||||
--cert ${DOCKER_CERT_PATH}/cert.pem \
|
||||
--key ${DOCKER_CERT_PATH}/key.pem \
|
||||
--cacert ${DOCKER_CERT_PATH}/ca.pem \
|
||||
|
|
@ -58,7 +67,7 @@ client for etcd. You can run it using the `docker exec` command.
|
|||
The examples below assume you are logged in with ssh into a UCP manager node.
|
||||
|
||||
```bash
|
||||
$ docker exec -it ucp-kv etcdctl \
|
||||
docker exec -it ucp-kv etcdctl \
|
||||
--endpoint https://127.0.0.1:2379 \
|
||||
--ca-file /etc/docker/ssl/ca.pem \
|
||||
--cert-file /etc/docker/ssl/cert.pem \
|
||||
|
|
@ -143,6 +152,8 @@ time="2017-07-14T20:46:09Z" level=debug msg="(01/16) Emergency Repaired Table \"
|
|||
{% endraw %}
|
||||
```
|
||||
|
||||
## Where to go next
|
||||
{% elsif include.version=="ucp-2.2" %}
|
||||
|
||||
* [Get support](../../get-support.md)
|
||||
Learn how to [troubleshoot cluster configurations](/datacenter/ucp/2.2/guides/admin/monitor-and-troubleshoot/troubleshoot-configurations.md).
|
||||
|
||||
{% endif %}
|
||||
|
|
|
|||
|
|
@ -2,7 +2,13 @@
|
|||
title: Troubleshoot UCP node states
|
||||
description: Learn how to troubleshoot individual UCP nodes.
|
||||
keywords: UCP, troubleshoot, health, swarm
|
||||
ui_tabs:
|
||||
- version: ucp-3.0
|
||||
orhigher: false
|
||||
- version: ucp-2.2
|
||||
orlower: true
|
||||
---
|
||||
{% if include.version=="ucp-3.0" %}
|
||||
|
||||
There are several cases in the lifecycle of UCP when a node is actively
|
||||
transitioning from one state to another, such as when a new node is joining the
|
||||
|
|
@ -27,3 +33,9 @@ UCP node, their explanation, and the expected duration of a given step.
|
|||
| Unhealthy UCP Controller: node is unreachable | Other manager nodes of the cluster have not received a heartbeat message from the affected node within a predetermined timeout. This usually indicates that there's either a temporary or permanent interruption in the network link to that manager node. Ensure the underlying networking infrastructure is operational, and [contact support](../../get-support.md) if the symptom persists. | Until resolved |
|
||||
| Unhealthy UCP Controller: unable to reach controller | The controller that we are currently communicating with is not reachable within a predetermined timeout. Please refresh the node listing to see if the symptom persists. If the symptom appears intermittently, this could indicate latency spikes between manager nodes, which can lead to temporary loss in the availability of UCP itself. Please ensure the underlying networking infrastructure is operational, and [contact support](../../get-support.md) if the symptom persists. | Until resolved |
|
||||
| Unhealthy UCP Controller: Docker Swarm Cluster: Local node `<ip>` has status Pending | The Engine ID of an engine is not unique in the swarm. When a node first joins the cluster, it's added to the node inventory and discovered as `Pending` by Docker Swarm. The engine is "validated" if a `ucp-swarm-manager` container can connect to it via TLS, and if its Engine ID is unique in the swarm. If you see this issue repeatedly, make sure that your engines don't have duplicate IDs. Use `docker info` to see the Engine ID. Refresh the ID by removing the `/etc/docker/key.json` file and restarting the daemon. | Until resolved |
|
||||
|
||||
{% elsif include.version=="ucp-2.2" %}
|
||||
|
||||
Learn how to [troubleshoot UCP node states](/datacenter/ucp/2.2/guides/admin/monitor-and-troubleshoot/troubleshoot-node-messages.md).
|
||||
|
||||
{% endif %}
|
||||
|
|
|
|||
|
|
@ -2,7 +2,16 @@
|
|||
title: Troubleshoot your cluster
|
||||
description: Learn how to troubleshoot your Docker Universal Control Plane cluster.
|
||||
keywords: ucp, troubleshoot, health, cluster
|
||||
ui_tabs:
|
||||
- version: ucp-3.0
|
||||
orhigher: false
|
||||
- version: ucp-2.2
|
||||
orlower: true
|
||||
next_steps:
|
||||
- path: troubleshoot-configurations/
|
||||
title: Troubleshoot configurations
|
||||
---
|
||||
{% if include.version=="ucp-3.0" %}
|
||||
|
||||
If you detect problems in your UCP cluster, you can start your troubleshooting
|
||||
session by checking the logs of the
|
||||
|
|
@ -96,7 +105,8 @@ transition to a different state. The `ucp-reconcile` container is responsible
|
|||
for creating and removing containers, issuing certificates, and pulling
|
||||
missing images.
|
||||
|
||||
{% elsif include.version=="ucp-2.2" %}
|
||||
|
||||
## Where to go next
|
||||
Learn how to [troubleshoot cluster configurations](/datacenter/ucp/2.2/guides/admin/monitor-and-troubleshoot/troubleshoot-with-logs.md).
|
||||
|
||||
* [Troubleshoot configurations](troubleshoot-configurations.md)
|
||||
{% endif %}
|
||||
|
|
|
|||
Loading…
Reference in New Issue