84 lines
11 KiB
Markdown
84 lines
11 KiB
Markdown
# Pod Metrics
|
|
|
|
| Metric name| Metric type | Labels/tags | Status |
|
|
| ---------- | ----------- | ----------- | ----------- |
|
|
| kube_pod_info | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `host_ip`=<host-ip> <br> `pod_ip`=<pod-ip> <br> `node`=<node-name><br> `created_by_kind`=<created_by_kind><br> `created_by_name`=<created_by_name><br> `uid`=<pod-uid><br> `priority_class`=<priority_class><br> `host_network`=<host_network>| STABLE |
|
|
| kube_pod_start_time | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> | STABLE |
|
|
| kube_pod_completion_time | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> | STABLE |
|
|
| kube_pod_owner | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `owner_kind`=<owner kind> <br> `owner_name`=<owner name> <br> `owner_is_controller`=<whether owner is controller> | STABLE |
|
|
| kube_pod_labels | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `label_POD_LABEL`=<POD_LABEL> | STABLE |
|
|
| kube_pod_status_phase | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `phase`=<Pending\|Running\|Succeeded\|Failed\|Unknown> | STABLE |
|
|
| kube_pod_status_ready | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `condition`=<true\|false\|unknown> | STABLE |
|
|
| kube_pod_status_scheduled | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `condition`=<true\|false\|unknown> | STABLE |
|
|
| kube_pod_container_info | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `image`=<image-name> <br> `image_id`=<image-id> <br> `container_id`=<containerid> | STABLE |
|
|
| kube_pod_container_status_waiting | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | STABLE |
|
|
| kube_pod_container_status_waiting_reason | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `reason`=<container-waiting-reason> | EXPERIMENTAL |
|
|
| kube_pod_container_status_running | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | STABLE |
|
|
| kube_pod_container_state_started | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | STABLE |
|
|
| kube_pod_container_status_terminated | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | STABLE |
|
|
| kube_pod_container_status_terminated_reason | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `reason`=<container-terminated-reason> | EXPERIMENTAL |
|
|
| kube_pod_container_status_last_terminated_reason | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `reason`=<last-terminated-reason> | EXPERIMENTAL |
|
|
| kube_pod_container_status_ready | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | STABLE |
|
|
| kube_pod_container_status_restarts_total | Counter | `container`=<container-name> <br> `namespace`=<pod-namespace> <br> `pod`=<pod-name> | STABLE |
|
|
| kube_pod_container_resource_requests | Gauge | `resource`=<resource-name> <br> `unit`=<resource-unit> <br> `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `node`=< node-name> | EXPERIMENTAL |
|
|
| kube_pod_container_resource_limits | Gauge | `resource`=<resource-name> <br> `unit`=<resource-unit> <br> `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `node`=< node-name> | EXPERIMENTAL |
|
|
| kube_pod_overhead_cpu_cores | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> | EXPERIMENTAL |
|
|
| kube_pod_overhead_memory_bytes | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> | EXPERIMENTAL |
|
|
| kube_pod_runtimeclass_name_info | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> | EXPERIMENTAL |
|
|
| kube_pod_created | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> | STABLE |
|
|
| kube_pod_deletion_timestamp | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> | EXPERIMENTAL |
|
|
| kube_pod_restart_policy | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `type`=<Always|Never|OnFailure> | STABLE |
|
|
| kube_pod_init_container_info | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `image`=<image-name> <br> `image_id`=<image-id> <br> `container_id`=<containerid> | STABLE |
|
|
| kube_pod_init_container_status_waiting | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | STABLE |
|
|
| kube_pod_init_container_status_waiting_reason | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `reason`=<container-waiting-reason> | EXPERIMENTAL |
|
|
| kube_pod_init_container_status_running | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | STABLE |
|
|
| kube_pod_init_container_status_terminated | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | STABLE |
|
|
| kube_pod_init_container_status_terminated_reason | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `reason`=<container-terminated-reason> | EXPERIMENTAL |
|
|
| kube_pod_init_container_status_last_terminated_reason | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `reason`=<last-terminated-reason> | EXPERIMENTAL |
|
|
| kube_pod_init_container_status_ready | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | STABLE |
|
|
| kube_pod_init_container_status_restarts_total | Counter | `container`=<container-name> <br> `namespace`=<pod-namespace> <br> `pod`=<pod-name> | STABLE |
|
|
| kube_pod_init_container_resource_limits | Gauge | `resource`=<resource-name> <br> `unit`=<resource-unit> <br> `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | EXPERIMENTAL |
|
|
| kube_pod_init_container_resource_limits_cpu_cores | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | EXPERIMENTAL |
|
|
| kube_pod_init_container_resource_limits_memory_bytes | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | EXPERIMENTAL |
|
|
| kube_pod_init_container_resource_limits_storage_bytes | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | EXPERIMENTAL |
|
|
| kube_pod_init_container_resource_limits_ephemeral_storage_bytes | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | EXPERIMENTAL |
|
|
| kube_pod_init_container_resource_requests | Gauge | `resource`=<resource-name> <br> `unit`=<resource-unit> <br> `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | EXPERIMENTAL |
|
|
| kube_pod_init_container_resource_requests_cpu_cores | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | EXPERIMENTAL |
|
|
| kube_pod_init_container_resource_requests_memory_bytes | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | EXPERIMENTAL |
|
|
| kube_pod_init_container_resource_requests_storage_bytes | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | EXPERIMENTAL |
|
|
| kube_pod_init_container_resource_requests_ephemeral_storage_bytes | Gauge | `container`=<container-name> <br> `pod`=<pod-name> <br> `namespace`=<pod-namespace> | EXPERIMENTAL |
|
|
| kube_pod_spec_volumes_persistentvolumeclaims_info | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `volume`=<volume-name> <br> `persistentvolumeclaim`=<persistentvolumeclaim-claimname> | STABLE |
|
|
| kube_pod_spec_volumes_persistentvolumeclaims_readonly | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `volume`=<volume-name> <br> `persistentvolumeclaim`=<persistentvolumeclaim-claimname> | STABLE |
|
|
| kube_pod_status_reason | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> <br> `reason`=<NodeLost\|Evicted\|UnexpectedAdmissionError> | EXPERIMENTAL |
|
|
| kube_pod_status_scheduled_time | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> | STABLE |
|
|
| kube_pod_status_unschedulable | Gauge | `pod`=<pod-name> <br> `namespace`=<pod-namespace> | STABLE |
|
|
|
|
## Useful metrics queries
|
|
|
|
### How to retrieve non-standard Pod state
|
|
|
|
It is not straightforward to get the Pod states for certain cases like "Terminating" and "Unknown" since it is not stored behind a field in the `Pod.Status`.
|
|
|
|
So to mimic the [logic](https://github.com/kubernetes/kubernetes/blob/v1.17.3/pkg/printers/internalversion/printers.go#L624) used by the `kubectl` command line, you will need to compose multiple metrics.
|
|
|
|
For example:
|
|
|
|
* To get the list of pods that are in the `Unknown` state, you can run the following PromQL query: `sum(kube_pod_status_phase{phase="Unknown"}) by (namespace, pod) or (count(kube_pod_deletion_timestamp) by (namespace, pod) * sum(kube_pod_status_reason{reason="NodeLost"}) by(namespace, pod))`
|
|
|
|
* For Pods in `Terminating` state: `count(kube_pod_deletion_timestamp) by (namespace, pod) * count(kube_pod_status_reason{reason="NodeLost"} == 0) by (namespace, pod)`
|
|
|
|
Here is an example of a Prometheus rule that can be used to alert on a Pod that has been in the `Terminated` state for more than `5m`.
|
|
|
|
```yaml
|
|
groups:
|
|
- name: Pod state
|
|
rules:
|
|
- alert: PodsBlockInTerminatingState
|
|
expr: count(kube_pod_deletion_timestamp) by (namespace, pod) * count(kube_pod_status_reason{reason="NodeLost"} == 0) by (namespace, pod) > 0
|
|
for: 5m
|
|
labels:
|
|
severity: page
|
|
annotations:
|
|
summary: Pod {{labels.namespace}}/{{labels.pod}} block in Terminating state.
|
|
```
|