8.6 KiB
Pod Metrics
| Metric name | Metric type | Labels/tags | Status |
|---|---|---|---|
| kube_pod_info | Gauge | pod=<pod-name> namespace=<pod-namespace> host_ip=<host-ip> pod_ip=<pod-ip> node=<node-name>created_by_kind=<created_by_kind>created_by_name=<created_by_name>uid=<pod-uid>priority_class=<priority_class> |
STABLE |
| kube_pod_start_time | Gauge | pod=<pod-name> namespace=<pod-namespace> |
|
| kube_pod_completion_time | Gauge | pod=<pod-name> namespace=<pod-namespace> |
STABLE |
| kube_pod_owner | Gauge | pod=<pod-name> namespace=<pod-namespace> owner_kind=<owner kind> owner_name=<owner name> owner_is_controller=<whether owner is controller> |
STABLE |
| kube_pod_labels | Gauge | pod=<pod-name> namespace=<pod-namespace> label_POD_LABEL=<POD_LABEL> |
STABLE |
| kube_pod_status_phase | Gauge | pod=<pod-name> namespace=<pod-namespace> phase=<Pending|Running|Succeeded|Failed|Unknown> |
STABLE |
| kube_pod_status_ready | Gauge | pod=<pod-name> namespace=<pod-namespace> condition=<true|false|unknown> |
STABLE |
| kube_pod_status_scheduled | Gauge | pod=<pod-name> namespace=<pod-namespace> condition=<true|false|unknown> |
STABLE |
| kube_pod_container_info | Gauge | container=<container-name> pod=<pod-name> namespace=<pod-namespace> image=<image-name> image_id=<image-id> container_id=<containerid> |
STABLE |
| kube_pod_container_status_waiting | Gauge | container=<container-name> pod=<pod-name> namespace=<pod-namespace> |
STABLE |
| kube_pod_container_status_waiting_reason | Gauge | container=<container-name> pod=<pod-name> namespace=<pod-namespace> reason=<ContainerCreating|CrashLoopBackOff|ErrImagePull|ImagePullBackOff|CreateContainerConfigError|InvalidImageName|CreateContainerError> |
STABLE |
| kube_pod_container_status_running | Gauge | container=<container-name> pod=<pod-name> namespace=<pod-namespace> |
STABLE |
| kube_pod_container_status_terminated | Gauge | container=<container-name> pod=<pod-name> namespace=<pod-namespace> |
STABLE |
| kube_pod_container_status_terminated_reason | Gauge | container=<container-name> pod=<pod-name> namespace=<pod-namespace> reason=<OOMKilled|Error|Completed|ContainerCannotRun|DeadlineExceeded> |
STABLE |
| kube_pod_container_status_last_terminated_reason | Gauge | container=<container-name> pod=<pod-name> namespace=<pod-namespace> reason=<OOMKilled|Error|Completed|ContainerCannotRun|DeadlineExceeded> |
STABLE |
| kube_pod_container_status_ready | Gauge | container=<container-name> pod=<pod-name> namespace=<pod-namespace> |
STABLE |
| kube_pod_container_status_restarts_total | Counter | container=<container-name> namespace=<pod-namespace> pod=<pod-name> |
STABLE |
| kube_pod_container_resource_requests | Gauge | resource=<resource-name> unit=<resource-unit> container=<container-name> pod=<pod-name> namespace=<pod-namespace> node=< node-name> |
STABLE |
| kube_pod_container_resource_limits | Gauge | resource=<resource-name> unit=<resource-unit> container=<container-name> pod=<pod-name> namespace=<pod-namespace> node=< node-name> |
STABLE |
| kube_pod_created | Gauge | pod=<pod-name> namespace=<pod-namespace> |
STABLE |
| kube_pod_deleted | Gauge | pod=<pod-name> namespace=<pod-namespace> |
EXPERIMENTAL |
| kube_pod_restart_policy | Gauge | pod=<pod-name> namespace=<pod-namespace> type=<Always |
Never |
| kube_pod_init_container_info | Gauge | container=<container-name> pod=<pod-name> namespace=<pod-namespace> image=<image-name> image_id=<image-id> container_id=<containerid> |
STABLE |
| kube_pod_init_container_status_waiting | Gauge | container=<container-name> pod=<pod-name> namespace=<pod-namespace> |
STABLE |
| kube_pod_init_container_status_waiting_reason | Gauge | container=<container-name> pod=<pod-name> namespace=<pod-namespace> reason=<ContainerCreating|CrashLoopBackOff|ErrImagePull|ImagePullBackOff|CreateContainerConfigError> |
STABLE |
| kube_pod_init_container_status_running | Gauge | container=<container-name> pod=<pod-name> namespace=<pod-namespace> |
STABLE |
| kube_pod_init_container_status_terminated | Gauge | container=<container-name> pod=<pod-name> namespace=<pod-namespace> |
STABLE |
| kube_pod_init_container_status_terminated_reason | Gauge | container=<container-name> pod=<pod-name> namespace=<pod-namespace> reason=<OOMKilled|Error|Completed|ContainerCannotRun|DeadlineExceeded> |
STABLE |
| kube_pod_init_container_status_last_terminated_reason | Gauge | container=<container-name> pod=<pod-name> namespace=<pod-namespace> reason=<OOMKilled|Error|Completed|ContainerCannotRun|DeadlineExceeded> |
STABLE |
| kube_pod_init_container_status_ready | Gauge | container=<container-name> pod=<pod-name> namespace=<pod-namespace> |
STABLE |
| kube_pod_init_container_status_restarts_total | Counter | container=<container-name> namespace=<pod-namespace> pod=<pod-name> |
STABLE |
| kube_pod_init_container_resource_limits | Gauge | resource=<resource-name> unit=<resource-unit> container=<container-name> pod=<pod-name> namespace=<pod-namespace> node=< node-name> |
STABLE |
| kube_pod_spec_volumes_persistentvolumeclaims_info | Gauge | pod=<pod-name> namespace=<pod-namespace> volume=<volume-name> persistentvolumeclaim=<persistentvolumeclaim-claimname> |
STABLE |
| kube_pod_spec_volumes_persistentvolumeclaims_readonly | Gauge | pod=<pod-name> namespace=<pod-namespace> volume=<volume-name> persistentvolumeclaim=<persistentvolumeclaim-claimname> |
STABLE |
| kube_pod_status_reason | Gauge | pod=<pod-name> namespace=<pod-namespace> reason=<NodeLost|Evicted> |
EXPERIMENTAL |
| kube_pod_status_scheduled_time | Gauge | pod=<pod-name> namespace=<pod-namespace> |
STABLE |
| kube_pod_status_unschedulable | Gauge | pod=<pod-name> namespace=<pod-namespace> |
STABLE |
Useful metrics queries
How to retrieve none standard Pod state
It is not straightforward to get the Pod states for certain cases like "Terminating" and "Unknown" since it is not stored behind a field in the Pod.Status.
So to get them, you will need to compose multiple metrics (like it is done in the kubectl command line code).
For example:
-
To get the list of pods that are in the
Unknownstate, you can run the following promQL query:count(kube_pod_status_phase{phase="Running"}) by (namespace, pod) * count(kube_pod_status_reason{reason="NodeLost"}) by(namespace, pod) -
For Pods in
Terminatedstate:count(kube_pod_status_phase{phase="Running"}) by (namespace, pod) * count(kube_pod_deleted) by (namespace, pod) * count(kube_pod_status_reason{reason!="NodeLost"})) by (namespace, pod)
Here is an example of a Prometheus rule that can be used to alert on a Pod that has been in the Terminated state for more than 5m.
groups:
- name: Pod state
rules:
- alert: PodsBlockInTerminatingState
expr: count(kube_pod_status_phase{phase="Running"}) by (namespace, pod) * count(kube_pod_deleted) by (namespace, pod) * count(kube_pod_status_reason{reason!="NodeLost"})) by (namespace, pod) > 0
for: 5m
labels:
severity: page
annotations:
summary: Pod {{labels.namespace}}/{{labels.pod}} block in terminating state.