Port monitoring topics to template (#388)

This commit is contained in:
Jim Galasyn 2018-01-03 17:53:34 -08:00
parent e37b7ccc6a
commit 1b9d4807b7
14 changed files with 116 additions and 56 deletions

View File

@ -1,13 +1,24 @@
---
title: Backups and disaster recovery
description: Learn how to backup your Docker Universal Control Plane swarm, and
to recover your swarm from an existing backup.
keywords: ucp, backup, restore, recovery
title: Backups and disaster recovery
ui_tabs:
- version: ucp-3.0
orhigher: false
- version: ucp-2.2
orlower: true
next_steps:
- path: configure/join-nodes/
title: Set up high availability
- path: ../ucp-architecture/
title: UCP architecture
---
{% if include.version=="ucp-3.0" %}
When you decide to start using Docker Universal Control Plane on a production
setting, you should
[configure it for high availability](configure/set-up-high-availability.md).
[configure it for high availability](configure/join-nodes/index.md).
The next step is creating a backup policy and disaster recovery plan.
@ -25,7 +36,7 @@ UCP maintains data about:
| Volumes | All [UCP named volumes](../architecture/#volumes-used-by-ucp), which include all UCP component certs and data |
This data is persisted on the host running UCP, using named volumes.
[Learn more about UCP named volumes](../architecture.md).
[Learn more about UCP named volumes](../ucp-architecture.md).
## Backup steps
@ -33,18 +44,18 @@ Back up your Docker EE components in the following order:
1. [Back up your swarm](/engine/swarm/admin_guide/#back-up-the-swarm)
2. Back up UCP
3. [Back up DTR](../../../../dtr/2.3/guides/admin/backups-and-disaster-recovery.md)
3. [Back up DTR](../../../../dtr/2.5/guides/admin/backups-and-disaster-recovery.md)
## Backup policy
As part of your backup policy you should regularly create backups of UCP.
DTR is backed up independently.
[Learn about DTR backups and recovery](../../../../dtr/2.3/guides/admin/backups-and-disaster-recovery.md).
[Learn about DTR backups and recovery](../../../../dtr/2.5/guides/admin/backups-and-disaster-recovery.md).
To create a UCP backup, run the `{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} backup` command
on a single UCP manager. This command creates a tar archive with the
contents of all the [volumes used by UCP](../architecture.md) to persist data
and streams it to stdout. The backup doesn't include the swarm-mode state,
contents of all the [volumes used by UCP](../ucp-architecture.md) to persist data
and streams it to `stdout`. The backup doesn't include the swarm-mode state,
like service definitions and overlay network definitions.
You only need to run the backup command on a single UCP manager node. Since UCP
@ -66,7 +77,7 @@ temporarily unable to:
To minimize the impact of the backup policy on your business, you should:
* Configure UCP for [high availability](configure/set-up-high-availability.md).
* Configure UCP for [high availability](configure/join-nodes/index.md).
This allows load-balancing user requests across multiple UCP manager nodes.
* Schedule the backup to take place outside business hours.
@ -77,14 +88,14 @@ verify its contents:
```none
# Create a backup, encrypt it, and store it on /tmp/backup.tar
$ docker container run --log-driver none --rm -i --name ucp \
docker container run --log-driver none --rm -i --name ucp \
-v /var/run/docker.sock:/var/run/docker.sock \
{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} backup --interactive > /tmp/backup.tar
# Ensure the backup is a valid tar and list its contents
# In a valid backup file, over 100 files should appear in the list
# and the `./ucp-node-certs/key.pem` file should be present
$ tar --list -f /tmp/backup.tar
tar --list -f /tmp/backup.tar
```
A backup file may optionally be encrypted using a passphrase, as in the
@ -92,13 +103,13 @@ following example:
```none
# Create a backup, encrypt it, and store it on /tmp/backup.tar
$ docker container run --log-driver none --rm -i --name ucp \
docker container run --log-driver none --rm -i --name ucp \
-v /var/run/docker.sock:/var/run/docker.sock \
{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} backup --interactive \
--passphrase "secret" > /tmp/backup.tar
# Decrypt the backup and list its contents
$ gpg --decrypt /tmp/backup.tar | tar --list
gpg --decrypt /tmp/backup.tar | tar --list
```
### Security-Enhanced Linux (SELinux)
@ -108,7 +119,7 @@ which is typical for RHEL hosts, you need to include `--security-opt label=disab
in the `docker` command:
```bash
$ docker container run --security-opt label=disable --log-driver none --rm -i --name ucp \
docker container run --security-opt label=disable --log-driver none --rm -i --name ucp \
-v /var/run/docker.sock:/var/run/docker.sock \
{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} backup --interactive > /tmp/backup.tar
```
@ -129,7 +140,7 @@ UCP from an existing backup file, presumed to be located at
`/tmp/backup.tar`:
```none
$ docker container run --rm -i --name ucp \
docker container run --rm -i --name ucp \
-v /var/run/docker.sock:/var/run/docker.sock \
{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} restore < /tmp/backup.tar
```
@ -138,17 +149,17 @@ If the backup file is encrypted with a passphrase, you will need to provide the
passphrase to the restore operation:
```none
$ docker container run --rm -i --name ucp \
docker container run --rm -i --name ucp \
-v /var/run/docker.sock:/var/run/docker.sock \
{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} restore --passphrase "secret" < /tmp/backup.tar
```
The restore command may also be invoked in interactive mode, in which case the
backup file should be mounted to the container rather than streamed through
stdin:
`stdin`:
```none
$ docker container run --rm -i --name ucp \
docker container run --rm -i --name ucp \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /tmp/backup.tar:/config/backup.tar \
{{ page.ucp_org }}/{{ page.ucp_repo }}:{{ page.ucp_version }} restore -i
@ -164,7 +175,7 @@ UCP restore recovers the following assets from the backup file:
authentication backends.
UCP restore does not include swarm assets such as cluster membership, services, networks,
secrets, etc. [Learn to backup a swarm](https://docs.docker.com/engine/swarm/admin_guide/#back-up-the-swarm).
secrets, etc. [Learn to backup a swarm](/engine/swarm/admin_guide/#back-up-the-swarm).
There are two ways to restore UCP:
@ -184,7 +195,7 @@ recommend making backups regularly.
It is important to note that this procedure is not guaranteed to succeed with
no loss of running services or configuration data. To properly protect against
manager failures, the system should be configured for
[high availability](configure/set-up-high-availability.md).
[high availability](configure/join-nodes/index.md).
1. On one of the remaining manager nodes, perform `docker swarm init
--force-new-cluster`. You may also need to specify an
@ -201,10 +212,11 @@ manager failures, the system should be configured for
5. Log in to UCP and browse to the nodes page, or use the CLI `docker node ls`
command.
6. If any nodes are listed as `down`, you'll have to manually [remove these
nodes](../configure/scale-your-cluster.md) from the swarm and then re-join
nodes](configure/scale-your-cluster.md) from the swarm and then re-join
them using a `docker swarm join` operation with the swarm's new join-token.
## Where to go next
{% elsif include.version=="ucp-2.2" %}
* [Set up high availability](configure/set-up-high-availability.md)
* [UCP architecture](../architecture.md)
Learn about [backups and disaster recovery](/datacenter/ucp/2.2/guides/admin/backups-and-disaster-recovery.md).
{% endif %}

View File

@ -126,6 +126,10 @@ If you're load-balancing user requests to UCP across multiple manager nodes,
when demoting those nodes into workers, don't forget to remove them from your
load-balancing pool.
{% elsif include.version=="ucp-2.2" %}
Learn about [scaling your cluster](/datacenter/ucp/2.2/guides/admin/configure/scale-your-cluster.md).
{% endif %}
{% endif %}
@ -171,10 +175,5 @@ To remove the node, use:
docker node rm <node-hostname>
```
## Where to go next
* [Use your own TLS certificates](use-your-own-tls-certificates.md)
* [Set up high availability](join-nodes/index.md)
{% endif %}
{% endif %}

View File

@ -4,7 +4,7 @@ description: Learn how to deploy Docker Universal Control Plane using images tha
keywords: UCP, Docker EE, image, IBM z, Windows
ui_tabs:
- version: ucp-3.0
orhigher: true
orhigher: false
- version: ucp-2.2
orlower: true
next_steps:

View File

@ -4,7 +4,7 @@ description: Learn how to install Docker Universal Control Plane on production.
keywords: Universal Control Plane, UCP, install, Docker EE
ui_tabs:
- version: ucp-3.0
orhigher: true
orhigher: false
- version: ucp-2.2
orlower: true
next_steps:

View File

@ -5,7 +5,7 @@ description: Learn how to install Docker Universal Control Plane. on a machine w
keywords: UCP, install, offline, Docker EE
ui_tabs:
- version: ucp-3.0
orhigher: true
orhigher: false
- version: ucp-2.2
orlower: true
next_steps:

View File

@ -4,7 +4,7 @@ description: Learn about the Docker Universal Control Plane architecture, and th
keywords: UCP, install, Docker EE
ui_tabs:
- version: ucp-3.0
orhigher: true
orhigher: false
- version: ucp-2.2
orlower: true
next_steps:

View File

@ -4,7 +4,7 @@ description: Learn about the system requirements for installing Docker Universal
keywords: UCP, architecture, requirements, Docker EE
ui_tabs:
- version: ucp-3.0
orhigher: true
orhigher: false
- version: ucp-2.2
orlower: true
next_steps:

View File

@ -4,7 +4,7 @@ description: Learn how to uninstall a Docker Universal Control Plane swarm.
keywords: UCP, uninstall, install, Docker EE
ui_tabs:
- version: ucp-3.0
orhigher: true
orhigher: false
- version: ucp-2.2
orlower: true
next_steps:

View File

@ -4,7 +4,7 @@ description: Learn how to upgrade Docker Universal Control Plane on a machine wi
keywords: ucp, upgrade, offline
ui_tabs:
- version: ucp-3.0
orhigher: true
orhigher: false
- version: ucp-2.2
orlower: true
next_steps:

View File

@ -4,7 +4,7 @@ description: Learn how to upgrade Docker Universal Control Plane with minimal im
keywords: UCP, upgrade, update
ui_tabs:
- version: ucp-3.0
orhigher: true
orhigher: false
- version: ucp-2.2
orlower: true
next_steps:

View File

@ -2,13 +2,26 @@
title: Monitor the cluster status
description: Monitor your Docker Universal Control Plane installation, and learn how to troubleshoot it.
keywords: UCP, troubleshoot, health, cluster
ui_tabs:
- version: ucp-3.0
orhigher: false
- version: ucp-2.2
orlower: true
cli_tabs:
- version: docker-cli-linux
next_steps:
- path: troubleshoot-with-logs/
title: Troubleshoot with logs
- path: troubleshoot-node-messages/
title: Troubleshoot node states
---
{% if include.ui %}
{% if include.version=="ucp-3.0" %}
You can monitor the status of UCP by using the web UI or the CLI.
You can also use the `_ping` endpoint to build monitoring automation.
## Check status from the UI
The first place to check the status of UCP is the UCP web UI, since it
shows warnings for situations that require your immediate attention.
Administrators might see more warnings than regular users.
@ -27,22 +40,29 @@ Click the node to get more info on its status. In the details pane, click
**Actions** and select **Agent logs** to see the log entries from the
node.
{% elsif include.version=="ucp-2.2" %}
## Check status from the CLI
Learn how to [monitor the cluster status](/datacenter/ucp/2.2/guides/admin/monitor-and-troubleshoot/index.md).
{% endif %}
{% endif %}
{% if include.cli %}
{% if include.version=="docker-cli-linux" %}
You can also monitor the status of a UCP cluster using the Docker CLI client.
Download [a UCP client certificate bundle](../../user/access-ucp/cli-based-access.md)
and then run:
```none
$ docker node ls
```bash
docker node ls
```
As a rule of thumb, if the status message starts with `[Pending]`, then the
current state is transient and the node is expected to correct itself back
into a healthy state. [Learn more about node status](troubleshoot-node-messages.md).
## Monitoring automation
You can use the `https://<ucp-manager-url>/_ping` endpoint to check the health
@ -64,9 +84,5 @@ URL of a manager node, and not a load balancer. In addition, please be aware tha
pinging the endpoint with HEAD will result in a 404 error code. It is better to
use GET instead.
## Where to go next
* [Troubleshoot with logs](troubleshoot-with-logs.md)
* [Troubleshoot node states](./troubleshoot-node-messages.md)
{% endif %}
{% endif %}

View File

@ -2,7 +2,16 @@
title: Troubleshoot cluster configurations
description: Learn how to troubleshoot your Docker Universal Control Plane cluster.
keywords: troubleshoot, etcd, rethinkdb, key, value, store, database, ucp, health, cluster
ui_tabs:
- version: ucp-3.0
orhigher: false
- version: ucp-2.2
orlower: true
next_steps:
- path: ../../get-support/
title: Get support
---
{% if include.version=="ucp-3.0" %}
UCP automatically tries to heal itself by monitoring its internal
components and trying to bring them to a healthy state.
@ -27,7 +36,7 @@ store REST API, and `jq` to process the responses.
You can install these tools on a Ubuntu distribution by running:
```bash
$ sudo apt-get update && apt-get install curl jq
sudo apt-get update && sudo apt-get install curl jq
```
1. Use a client bundle to authenticate your requests.
@ -38,9 +47,9 @@ $ sudo apt-get update && apt-get install curl jq
bundle.
```bash
$ export KV_URL="https://$(echo $DOCKER_HOST | cut -f3 -d/ | cut -f1 -d:):12379"
export KV_URL="https://$(echo $DOCKER_HOST | cut -f3 -d/ | cut -f1 -d:):12379"
$ curl -s \
curl -s \
--cert ${DOCKER_CERT_PATH}/cert.pem \
--key ${DOCKER_CERT_PATH}/key.pem \
--cacert ${DOCKER_CERT_PATH}/ca.pem \
@ -58,7 +67,7 @@ client for etcd. You can run it using the `docker exec` command.
The examples below assume you are logged in with ssh into a UCP manager node.
```bash
$ docker exec -it ucp-kv etcdctl \
docker exec -it ucp-kv etcdctl \
--endpoint https://127.0.0.1:2379 \
--ca-file /etc/docker/ssl/ca.pem \
--cert-file /etc/docker/ssl/cert.pem \
@ -143,6 +152,8 @@ time="2017-07-14T20:46:09Z" level=debug msg="(01/16) Emergency Repaired Table \"
{% endraw %}
```
## Where to go next
{% elsif include.version=="ucp-2.2" %}
* [Get support](../../get-support.md)
Learn how to [troubleshoot cluster configurations](/datacenter/ucp/2.2/guides/admin/monitor-and-troubleshoot/troubleshoot-configurations.md).
{% endif %}

View File

@ -2,7 +2,13 @@
title: Troubleshoot UCP node states
description: Learn how to troubleshoot individual UCP nodes.
keywords: UCP, troubleshoot, health, swarm
ui_tabs:
- version: ucp-3.0
orhigher: false
- version: ucp-2.2
orlower: true
---
{% if include.version=="ucp-3.0" %}
There are several cases in the lifecycle of UCP when a node is actively
transitioning from one state to another, such as when a new node is joining the
@ -27,3 +33,9 @@ UCP node, their explanation, and the expected duration of a given step.
| Unhealthy UCP Controller: node is unreachable | Other manager nodes of the cluster have not received a heartbeat message from the affected node within a predetermined timeout. This usually indicates that there's either a temporary or permanent interruption in the network link to that manager node. Ensure the underlying networking infrastructure is operational, and [contact support](../../get-support.md) if the symptom persists. | Until resolved |
| Unhealthy UCP Controller: unable to reach controller | The controller that we are currently communicating with is not reachable within a predetermined timeout. Please refresh the node listing to see if the symptom persists. If the symptom appears intermittently, this could indicate latency spikes between manager nodes, which can lead to temporary loss in the availability of UCP itself. Please ensure the underlying networking infrastructure is operational, and [contact support](../../get-support.md) if the symptom persists. | Until resolved |
| Unhealthy UCP Controller: Docker Swarm Cluster: Local node `<ip>` has status Pending | The Engine ID of an engine is not unique in the swarm. When a node first joins the cluster, it's added to the node inventory and discovered as `Pending` by Docker Swarm. The engine is "validated" if a `ucp-swarm-manager` container can connect to it via TLS, and if its Engine ID is unique in the swarm. If you see this issue repeatedly, make sure that your engines don't have duplicate IDs. Use `docker info` to see the Engine ID. Refresh the ID by removing the `/etc/docker/key.json` file and restarting the daemon. | Until resolved |
{% elsif include.version=="ucp-2.2" %}
Learn how to [troubleshoot UCP node states](/datacenter/ucp/2.2/guides/admin/monitor-and-troubleshoot/troubleshoot-node-messages.md).
{% endif %}

View File

@ -2,7 +2,16 @@
title: Troubleshoot your cluster
description: Learn how to troubleshoot your Docker Universal Control Plane cluster.
keywords: ucp, troubleshoot, health, cluster
ui_tabs:
- version: ucp-3.0
orhigher: false
- version: ucp-2.2
orlower: true
next_steps:
- path: troubleshoot-configurations/
title: Troubleshoot configurations
---
{% if include.version=="ucp-3.0" %}
If you detect problems in your UCP cluster, you can start your troubleshooting
session by checking the logs of the
@ -96,7 +105,8 @@ transition to a different state. The `ucp-reconcile` container is responsible
for creating and removing containers, issuing certificates, and pulling
missing images.
{% elsif include.version=="ucp-2.2" %}
## Where to go next
Learn how to [troubleshoot cluster configurations](/datacenter/ucp/2.2/guides/admin/monitor-and-troubleshoot/troubleshoot-with-logs.md).
* [Troubleshoot configurations](troubleshoot-configurations.md)
{% endif %}