kube-state-metrics/docs/pod-metrics.md

8.6 KiB

Pod Metrics

Metric name Metric type Labels/tags Status
kube_pod_info Gauge pod=<pod-name>
namespace=<pod-namespace>
host_ip=<host-ip>
pod_ip=<pod-ip>
node=<node-name>
created_by_kind=<created_by_kind>
created_by_name=<created_by_name>
uid=<pod-uid>
priority_class=<priority_class>
STABLE
kube_pod_start_time Gauge pod=<pod-name>
namespace=<pod-namespace>
kube_pod_completion_time Gauge pod=<pod-name>
namespace=<pod-namespace>
STABLE
kube_pod_owner Gauge pod=<pod-name>
namespace=<pod-namespace>
owner_kind=<owner kind>
owner_name=<owner name>
owner_is_controller=<whether owner is controller>
STABLE
kube_pod_labels Gauge pod=<pod-name>
namespace=<pod-namespace>
label_POD_LABEL=<POD_LABEL>
STABLE
kube_pod_status_phase Gauge pod=<pod-name>
namespace=<pod-namespace>
phase=<Pending|Running|Succeeded|Failed|Unknown>
STABLE
kube_pod_status_ready Gauge pod=<pod-name>
namespace=<pod-namespace>
condition=<true|false|unknown>
STABLE
kube_pod_status_scheduled Gauge pod=<pod-name>
namespace=<pod-namespace>
condition=<true|false|unknown>
STABLE
kube_pod_container_info Gauge container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
image=<image-name>
image_id=<image-id>
container_id=<containerid>
STABLE
kube_pod_container_status_waiting Gauge container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
STABLE
kube_pod_container_status_waiting_reason Gauge container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
reason=<ContainerCreating|CrashLoopBackOff|ErrImagePull|ImagePullBackOff|CreateContainerConfigError|InvalidImageName|CreateContainerError>
STABLE
kube_pod_container_status_running Gauge container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
STABLE
kube_pod_container_status_terminated Gauge container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
STABLE
kube_pod_container_status_terminated_reason Gauge container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
reason=<OOMKilled|Error|Completed|ContainerCannotRun|DeadlineExceeded>
STABLE
kube_pod_container_status_last_terminated_reason Gauge container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
reason=<OOMKilled|Error|Completed|ContainerCannotRun|DeadlineExceeded>
STABLE
kube_pod_container_status_ready Gauge container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
STABLE
kube_pod_container_status_restarts_total Counter container=<container-name>
namespace=<pod-namespace>
pod=<pod-name>
STABLE
kube_pod_container_resource_requests Gauge resource=<resource-name>
unit=<resource-unit>
container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
node=< node-name>
STABLE
kube_pod_container_resource_limits Gauge resource=<resource-name>
unit=<resource-unit>
container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
node=< node-name>
STABLE
kube_pod_created Gauge pod=<pod-name>
namespace=<pod-namespace>
STABLE
kube_pod_deleted Gauge pod=<pod-name>
namespace=<pod-namespace>
EXPERIMENTAL
kube_pod_restart_policy Gauge pod=<pod-name>
namespace=<pod-namespace>
type=<Always
Never
kube_pod_init_container_info Gauge container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
image=<image-name>
image_id=<image-id>
container_id=<containerid>
STABLE
kube_pod_init_container_status_waiting Gauge container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
STABLE
kube_pod_init_container_status_waiting_reason Gauge container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
reason=<ContainerCreating|CrashLoopBackOff|ErrImagePull|ImagePullBackOff|CreateContainerConfigError>
STABLE
kube_pod_init_container_status_running Gauge container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
STABLE
kube_pod_init_container_status_terminated Gauge container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
STABLE
kube_pod_init_container_status_terminated_reason Gauge container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
reason=<OOMKilled|Error|Completed|ContainerCannotRun|DeadlineExceeded>
STABLE
kube_pod_init_container_status_last_terminated_reason Gauge container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
reason=<OOMKilled|Error|Completed|ContainerCannotRun|DeadlineExceeded>
STABLE
kube_pod_init_container_status_ready Gauge container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
STABLE
kube_pod_init_container_status_restarts_total Counter container=<container-name>
namespace=<pod-namespace>
pod=<pod-name>
STABLE
kube_pod_init_container_resource_limits Gauge resource=<resource-name>
unit=<resource-unit>
container=<container-name>
pod=<pod-name>
namespace=<pod-namespace>
node=< node-name>
STABLE
kube_pod_spec_volumes_persistentvolumeclaims_info Gauge pod=<pod-name>
namespace=<pod-namespace>
volume=<volume-name>
persistentvolumeclaim=<persistentvolumeclaim-claimname>
STABLE
kube_pod_spec_volumes_persistentvolumeclaims_readonly Gauge pod=<pod-name>
namespace=<pod-namespace>
volume=<volume-name>
persistentvolumeclaim=<persistentvolumeclaim-claimname>
STABLE
kube_pod_status_reason Gauge pod=<pod-name>
namespace=<pod-namespace>
reason=<NodeLost|Evicted&gt;
EXPERIMENTAL
kube_pod_status_scheduled_time Gauge pod=<pod-name>
namespace=<pod-namespace>
STABLE
kube_pod_status_unschedulable Gauge pod=<pod-name>
namespace=<pod-namespace>
STABLE

Useful metrics queries

How to retrieve none standard Pod state

It is not straightforward to get the Pod states for certain cases like "Terminating" and "Unknown" since it is not stored behind a field in the Pod.Status.

So to get them, you will need to compose multiple metrics (like it is done in the kubectl command line code).

For example:

  • To get the list of pods that are in the Unknown state, you can run the following promQL query: count(kube_pod_status_phase{phase="Running"}) by (namespace, pod) * count(kube_pod_status_reason{reason="NodeLost"}) by(namespace, pod)

  • For Pods in Terminated state: count(kube_pod_status_phase{phase="Running"}) by (namespace, pod) * count(kube_pod_deleted) by (namespace, pod) * count(kube_pod_status_reason{reason!="NodeLost"})) by (namespace, pod)

Here is an example of a Prometheus rule that can be used to alert on a Pod that has been in the Terminated state for more than 5m.

groups:
- name: Pod state
  rules:
  - alert: PodsBlockInTerminatingState
    expr: count(kube_pod_status_phase{phase="Running"}) by (namespace, pod) * count(kube_pod_deleted) by (namespace, pod) * count(kube_pod_status_reason{reason!="NodeLost"})) by (namespace, pod) > 0
    for: 5m
    labels:
      severity: page
    annotations:
      summary: Pod {{labels.namespace}}/{{labels.pod}} block in terminating state.