Merge pull request #8271 from docker/ucp-metrics

Adding info for UCP metrics (with Prometheus)
2019-02-17 15:15:55 -05:00 · 2019-02-17 15:15:55 -05:00 · 7d407ed0da
parent e15415000f f92d35285e
commit 7d407ed0da
2 changed files with 88 additions and 0 deletions
--- a/_data/toc.yaml
+++ b/_data/toc.yaml
@ -1182,6 +1182,8 @@ manuals:
          title: Add SANs to cluster certificates
        - path: /ee/ucp/admin/configure/collect-cluster-metrics/
          title: Collect UCP cluster metrics with Prometheus
+        - path: /ee/ucp/admin/configure/metrics-descriptions/
+          title: Using UCP cluster metrics with Prometheus
        - path: /ee/ucp/admin/configure/configure-rbac-kube/
          title: Configure native Kubernetes role-based access control
        - path: /ee/ucp/admin/configure/create-audit-logs/
--- a/ee/ucp/admin/configure/metrics-descriptions.md
+++ b/ee/ucp/admin/configure/metrics-descriptions.md
@ -0,0 +1,86 @@
+---
+description: Using UCP cluster metrics with Prometheus
+keywords: prometheus, metrics, ucp
+title: Using UCP cluster metrics with Prometheus
+redirect_from:
+- /engine/admin/prometheus/
+---
+
+# UCP metrics
+
+The following table lists the metrics that UCP exposes in Prometheus, along with descriptions. Note that only the metrics 
+labeled with `ucp_` are documented. Other metrics are exposed in Prometheus but are not documented.
+
+| Name                                                    | Units                | Description                                                                                                                                                                                                                                                                     | Labels                                         | Metric source |
+|---------------------------------------------------------|----------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|---------------|
+| `ucp_controller_services`                               | number of services   | The total number of Swarm services                                                                                                                                                                                                                                              |                                                | Controller    |
+| `ucp_engine_container_cpu_percent`                      | percentage           | The percentage of CPU time this container is using.                                                                                                                                                                                                                             | container labels                               | Node          |
+| `ucp_engine_container_cpu_total_time_nanoseconds`       | nanoseconds          | Total CPU time used by this container in nanoseconds                                                                                                                                                                                                                            | container labels                               | Node          |
+| `ucp_engine_container_health`                           | 0.0 or 1.0           | Whether or not this container is healthy, according to its healthcheck. Note that if this value is 0, it just means that the container is not reporting healthy; it might not have a healthcheck defined at all, or its healthcheck might not have returned any results yet     | container labels                               | Node          |
+| `ucp_engine_container_memory_max_usage_bytes`           | bytes                | Maximum memory used by this container in bytes                                                                                                                                                                                                                                  | container labels                               | Node          |
+| `ucp_engine_container_memory_usage_bytes`               | bytes                | Current memory used by this container in bytes                                                                                                                                                                                                                                  | container labels                               | Node          |
+| `ucp_engine_container_memory_usage_percent`             | percentage           | Percentage of total node memory currently being used by this container                                                                                                                                                                                                          | container labels                               | Node          |
+| `ucp_engine_container_network_rx_bytes_total`           | bytes                | Number of bytes received by this container on this network in the last sample                                                                                                                                                                                                   | container networking labels                    | Node          |
+| `ucp_engine_container_network_rx_dropped_packets_total` | number of packets    | Number of packets bound for this container on this network that were dropped in the last sample                                                                                                                                                                                 | container networking labels                    | Node          |
+| `ucp_engine_container_network_rx_errors_total`          | number of errors     | Number of received network errors for this container on this network in the last sample                                                                                                                                                                                         | container networking labels                    | Node          |
+| `ucp_engine_container_network_rx_packets_total`         | number of packets    | Number of received packets for this container on this network in the last sample                                                                                                                                                                                                | container networking labels                    | Node          |
+| `ucp_engine_container_network_tx_bytes_total`           | bytes                | Number of bytes sent by this container on this network in the last sample                                                                                                                                                                                                       | container networking labels                    | Node          |
+| `ucp_engine_container_network_tx_dropped_packets_total` | number of packets    | Number of packets sent from this container on this network that were dropped in the last sample                                                                                                                                                                                 | container networking labels                    | Node          |
+| `ucp_engine_container_network_tx_errors_total`          | number of errors     | Number of sent network errors for this container on this network in the last sample                                                                                                                                                                                             | container networking labels                    | Node          |
+| `ucp_engine_container_network_tx_packets_total`         | number of packets    | Number of sent packets for this container on this network in the last sample                                                                                                                                                                                                    | container networking labels                    | Node          |
+| `ucp_engine_container_unhealth`                         | 0.0 or 1.0           | Whether or not this container is unhealthy, according to its healthcheck. Note that if this value is 0, it just means that the container is not reporting unhealthy; it might not have a healthcheck defined at all, or its healthcheck might not have returned any results yet | container labels                               | Node          |
+| `ucp_engine_containers`                                 | number of containers | Total number of containers on this node                                                                                                                                                                                                                                         | node labels                                    | Node          |
+| `ucp_engine_cpu_total_time_nanoseconds`                 | nanoseconds          | System CPU time used by this container in nanoseconds                                                                                                                                                                                                                           | container labels                               | Node          |
+| `ucp_engine_disk_free_bytes`                            | bytes                | Free disk space on the Docker root directory on this node in bytes. Note that this metric is not available for Windows nodes                                                                                                                                                    | node labels                                    | Node          |
+| `ucp_engine_disk_total_bytes`                           | bytes                | Total disk space on the Docker root directory on this node in bytes. Note that this metric is not available for Windows nodes                                                                                                                                                   | node labels                                    | Node          |
+| `ucp_engine_images`                                     | number of images     | Total number of images on this node                                                                                                                                                                                                                                             | node labels                                    | Node          |
+| `ucp_engine_memory_total_bytes`                         | bytes                | Total amount of memory on this node in bytes                                                                                                                                                                                                                                    | node labels                                    | Node          |
+| `ucp_engine_networks`                                   | number of networks   | Total number of networks on this node                                                                                                                                                                                                                                           | node labels                                    | Node          |
+| `ucp_engine_node_health`                                | 0.0 or 1.0           | Whether or not this node is healthy, as determined by UCP                                                                                                                                                                                                                       | nodeName: node name, nodeAddr: node IP address | Controller    |
+| `ucp_engine_num_cpu_cores`                              | number of cores      | Number of CPU cores on this node                                                                                                                                                                                                                                                | node labels                                    | Node          |
+| `ucp_engine_pod_container_ready`                        | 0.0 or 1.0           | Whether or not this container in a Kubernetes pod is ready, as determined by its readiness probe.                                                                                                                                                                               | pod labels                                     | Controller    |
+| `ucp_engine_pod_ready`                                  | 0.0 or 1.0           | Whether or not this Kubernetes pod is ready, as determined by its readiness probe.                                                                                                                                                                                              | pod labels                                     | Controller    |
+| `ucp_engine_volumes`                                    | number of volumes    | Total number of volumes on this node                                                                                                                                                                                                                                            | node labels                                    | Node          |
+
+## Metrics labels
+
+Metrics exposed by UCP in Prometheus have standardized labels, depending on the resource that they are measuring. 
+The following table lists some of the labels that are used, along with their values:
+
+### Container labels
+
+| Label name         | Value                                                                                       |
+|--------------------|---------------------------------------------------------------------------------------------|
+| `collection`       | The collection ID of the collection this container is in, if any                            |
+| `container`        | The ID of this container                                                                    |
+| `image`            | The name of this container's image                                                          |
+| `manager`          | "true" if the container's node is a UCP manager, "false" otherwise                          |
+| `name`             | The name of the container                                                                   |
+| `podName`          | If this container is part of a Kubernetes pod, this is the pod's name                       |
+| `podNamespace`     | If this container is part of a Kubernetes pod, this is the pod's namespace                  |
+| `podContainerName` | If this container is part of a Kubernetes pod, this is the container's name in the pod spec |
+| `service`          | If this container is part of a Swarm service, this is the service ID                        |
+| `stack`            | If this container is part of a Docker compose stack, this is the name of the stack          |
+
+### Container networking labels
+
+The following metrics measure network activity for a given network attached to a given
+container. They have the same labels as Container labels, with one addition:
+
+| Label name | Value                 |
+|------------|-----------------------|
+| `network`  | The ID of the network |
+
+### Node labels
+
+| Label name | Value                                                  |
+|------------|--------------------------------------------------------|
+| `manager`    | "true" if the node is a UCP manager, "false" otherwise |
+
+## Metric source
+
+UCP exports metrics on every node and also exports additional metrics from
+every controller. The metrics that are exported from controllers are
+cluster-scoped, for example, the total number of Swarm services. Metrics that
+are exported from nodes are specific to those nodes, for example, the total memory
+on that node.