40 KiB
Pod Metrics
| Metric name | Metric type | Description | Unit (where applicable) | Labels/tags | Status | Opt-in |
|---|---|---|---|---|---|---|
| kube_pod_annotations | Gauge | Kubernetes annotations converted to Prometheus labels controlled via --metric-annotations-allowlist | pod=<pod-name> namespace=<pod-namespace> annotation_POD_ANNOTATION=<POD_ANNOTATION> uid=<pod-uid> |
EXPERIMENTAL | - | |
| kube_pod_info | Gauge | Information about pod | pod=<pod-name> namespace=<pod-namespace> host_ip=<host-ip> pod_ip=<pod-ip> node=<node-name>created_by_kind=<created_by_kind>created_by_name=<created_by_name>uid=<pod-uid>priority_class=<priority_class>host_network=<host_network> |
STABLE | - | |
| kube_pod_ips | Gauge | Pod IP addresses | pod=<pod-name> namespace=<pod-namespace> ip=<pod-ip-address> ip_family=<4 OR 6> uid=<pod-uid> |
EXPERIMENTAL | - | |
| kube_pod_start_time | Gauge | Start time in unix timestamp for a pod | seconds | pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
STABLE | - |
| kube_pod_completion_time | Gauge | Completion time in unix timestamp for a pod | seconds | pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
STABLE | - |
| kube_pod_owner | Gauge | Information about the Pod's owner | pod=<pod-name> namespace=<pod-namespace> owner_kind=<owner kind> owner_name=<owner name> owner_is_controller=<whether owner is controller> uid=<pod-uid> |
STABLE | - | |
| kube_pod_labels | Gauge | Kubernetes labels converted to Prometheus labels controlled via --metric-labels-allowlist | pod=<pod-name> namespace=<pod-namespace> label_POD_LABEL=<POD_LABEL> uid=<pod-uid> |
STABLE | - | |
| kube_pod_nodeselectors | Gauge | Describes the Pod nodeSelectors | pod=<pod-name> namespace=<pod-namespace> nodeselector_NODE_SELECTOR=<NODE_SELECTOR> uid=<pod-uid> |
EXPERIMENTAL | Opt-in | |
| kube_pod_status_phase | Gauge | The pods current phase | pod=<pod-name> namespace=<pod-namespace> phase=<Pending|Running|Succeeded|Failed|Unknown> uid=<pod-uid> |
STABLE | - | |
| kube_pod_status_qos_class | Gauge | The pods current qosClass | pod=<pod-name> namespace=<pod-namespace> qos_class=<BestEffort|Burstable|Guaranteed> uid=<pod-uid> |
EXPERIMENTAL | - | |
| kube_pod_status_ready | Gauge | Describes whether the pod is ready to serve requests | pod=<pod-name> namespace=<pod-namespace> condition=<true|false|unknown> uid=<pod-uid> |
STABLE | - | |
| kube_pod_status_scheduled | Gauge | Describes the status of the scheduling process for the pod | pod=<pod-name> namespace=<pod-namespace> condition=<true|false|unknown> uid=<pod-uid> |
STABLE | - | |
| kube_pod_container_info | Gauge | Information about a container in a pod | container=<container-name> pod=<pod-name> namespace=<pod-namespace> image=<image-name> image_id=<image-id> image_spec=<image-spec> container_id=<containerid> uid=<pod-uid> |
STABLE | - | |
| kube_pod_container_status_waiting | Gauge | Describes whether the container is currently in waiting state | container=<container-name> pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
STABLE | - | |
| kube_pod_container_status_waiting_reason | Gauge | Describes the reason the container is currently in waiting state | container=<container-name> pod=<pod-name> namespace=<pod-namespace> reason=<container-waiting-reason> uid=<pod-uid> |
STABLE | - | |
| kube_pod_container_status_running | Gauge | Describes whether the container is currently in running state | container=<container-name> pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
STABLE | - | |
| kube_pod_container_state_started | Gauge | Start time in unix timestamp for a pod container | seconds | container=<container-name> pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
STABLE | - |
| kube_pod_container_status_terminated | Gauge | Describes whether the container is currently in terminated state | container=<container-name> pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
STABLE | - | |
| kube_pod_container_status_terminated_reason | Gauge | Describes the reason the container is currently in terminated state | container=<container-name> pod=<pod-name> namespace=<pod-namespace> reason=<container-terminated-reason> uid=<pod-uid> |
EXPERIMENTAL | - | |
| kube_pod_container_status_last_terminated_reason | Gauge | Describes the last reason the container was in terminated state | container=<container-name> pod=<pod-name> namespace=<pod-namespace> reason=<last-terminated-reason> uid=<pod-uid> |
EXPERIMENTAL | - | |
| kube_pod_container_status_last_terminated_exitcode | Gauge | Describes the exit code for the last container in terminated state. | container=<container-name> pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
EXPERIMENTAL | - | |
| kube_pod_container_status_last_terminated_timestamp | Gauge | Last terminated time for a pod container in unix timestamp. | container=<container-name> pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
EXPERIMENTAL | - | |
| kube_pod_container_status_ready | Gauge | Describes whether the containers readiness check succeeded | container=<container-name> pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
STABLE | - | |
| kube_pod_status_initialized_time | Gauge | Time when the pod is initialized. | seconds | pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
EXPERIMENTAL | - |
| kube_pod_status_ready_time | Gauge | Time when pod passed readiness probes. | seconds | pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
EXPERIMENTAL | - |
| kube_pod_status_container_ready_time | Gauge | Time when the container of the pod entered Ready state. | seconds | pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
EXPERIMENTAL | - |
| kube_pod_container_status_restarts_total | Counter | The number of container restarts per container | container=<container-name> namespace=<pod-namespace> pod=<pod-name> uid=<pod-uid> |
STABLE | - | |
| kube_pod_container_resource_requests | Gauge | The number of requested request resource by a container. It is recommended to use the kube_pod_resource_requests metric exposed by kube-scheduler instead, as it is more precise. |
cpu=<core> memory=<bytes> |
resource=<resource-name> unit=<resource-unit> container=<container-name> pod=<pod-name> namespace=<pod-namespace> node=< node-name> uid=<pod-uid> |
BETA | - |
| kube_pod_container_resource_limits | Gauge | The number of requested limit resource by a container. It is recommended to use the kube_pod_resource_limits metric exposed by kube-scheduler instead, as it is more precise. |
cpu=<core> memory=<bytes> |
resource=<resource-name> unit=<resource-unit> container=<container-name> pod=<pod-name> namespace=<pod-namespace> node=< node-name> uid=<pod-uid> |
BETA | - |
| kube_pod_overhead_cpu_cores | Gauge | The pod overhead in regards to cpu cores associated with running a pod | core | pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
EXPERIMENTAL | - |
| kube_pod_overhead_memory_bytes | Gauge | The pod overhead in regards to memory associated with running a pod | bytes | pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
EXPERIMENTAL | - |
| kube_pod_runtimeclass_name_info | Gauge | The runtimeclass associated with the pod | pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
EXPERIMENTAL | - | |
| kube_pod_created | Gauge | Unix creation timestamp | seconds | pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
STABLE | - |
| kube_pod_deletion_timestamp | Gauge | Unix deletion timestamp | seconds | pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
EXPERIMENTAL | - |
| kube_pod_restart_policy | Gauge | Describes the restart policy in use by this pod | pod=<pod-name> namespace=<pod-namespace> type=<Always|Never|OnFailure> uid=<pod-uid> |
STABLE | - | |
| kube_pod_init_container_info | Gauge | Information about an init container in a pod | container=<container-name> pod=<pod-name> namespace=<pod-namespace> image=<image-name> image_id=<image-id> image_spec=<image-spec> container_id=<containerid> uid=<pod-uid> restart_policy=<restart-policy> |
STABLE | - | |
| kube_pod_init_container_status_waiting | Gauge | Describes whether the init container is currently in waiting state | container=<container-name> pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
STABLE | - | |
| kube_pod_init_container_status_waiting_reason | Gauge | Describes the reason the init container is currently in waiting state | container=<container-name> pod=<pod-name> namespace=<pod-namespace> reason=<container-waiting-reason> uid=<pod-uid> |
EXPERIMENTAL | - | |
| kube_pod_init_container_status_running | Gauge | Describes whether the init container is currently in running state | container=<container-name> pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
STABLE | - | |
| kube_pod_init_container_status_terminated | Gauge | Describes whether the init container is currently in terminated state | container=<container-name> pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
STABLE | - | |
| kube_pod_init_container_status_terminated_reason | Gauge | Describes the reason the init container is currently in terminated state | container=<container-name> pod=<pod-name> namespace=<pod-namespace> reason=<container-terminated-reason> uid=<pod-uid> |
EXPERIMENTAL | - | |
| kube_pod_init_container_status_last_terminated_reason | Gauge | Describes the last reason the init container was in terminated state | container=<container-name> pod=<pod-name> namespace=<pod-namespace> reason=<last-terminated-reason> uid=<pod-uid> |
EXPERIMENTAL | - | |
| kube_pod_init_container_status_ready | Gauge | Describes whether the init containers readiness check succeeded | container=<container-name> pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
STABLE | - | |
| kube_pod_init_container_status_restarts_total | Counter | The number of restarts for the init container | integer | container=<container-name> namespace=<pod-namespace> pod=<pod-name> uid=<pod-uid> |
STABLE | - |
| kube_pod_init_container_resource_limits | Gauge | The number of CPU cores requested limit by an init container | cpu=<core> memory=<bytes> |
resource=<resource-name> unit=<resource-unit> container=<container-name> pod=<pod-name> namespace=<pod-namespace> node=< node-name> uid=<pod-uid> |
EXPERIMENTAL | - |
| kube_pod_init_container_resource_requests | Gauge | The number of CPU cores requested by an init container | cpu=<core> memory=<bytes> |
resource=<resource-name> unit=<resource-unit> container=<container-name> pod=<pod-name> namespace=<pod-namespace> node=< node-name> uid=<pod-uid> |
EXPERIMENTAL | - |
| kube_pod_spec_volumes_persistentvolumeclaims_info | Gauge | Information about persistentvolumeclaim volumes in a pod | pod=<pod-name> namespace=<pod-namespace> volume=<volume-name> persistentvolumeclaim=<persistentvolumeclaim-claimname> uid=<pod-uid> |
STABLE | - | |
| kube_pod_spec_volumes_persistentvolumeclaims_readonly | Gauge | Describes whether a persistentvolumeclaim is mounted read only | bool | pod=<pod-name> namespace=<pod-namespace> volume=<volume-name> persistentvolumeclaim=<persistentvolumeclaim-claimname> uid=<pod-uid> |
STABLE | - |
| kube_pod_status_reason | Gauge | The pod status reasons | pod=<pod-name> namespace=<pod-namespace> reason=<Evicted|NodeAffinity|NodeLost|Shutdown|UnexpectedAdmissionError> uid=<pod-uid> |
EXPERIMENTAL | - | |
| kube_pod_status_scheduled_time | Gauge | Unix timestamp when pod moved into scheduled status | seconds | pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
STABLE | - |
| kube_pod_status_unschedulable | Gauge | Describes the unschedulable status for the pod | pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
STABLE | - | |
| kube_pod_status_unscheduled_time | Gauge | Unix timestamp when pod moved into unscheduled status | seconds | pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> |
EXPERIMENTAL | - |
| kube_pod_tolerations | Gauge | Information about the pod tolerations | pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> key=<toleration-key> operator=<toleration-operator> value=<toleration-value> effect=<toleration-effect> toleration_seconds=<toleration-seconds> |
EXPERIMENTAL | - | |
| kube_pod_service_account | Gauge | The service account for a pod | pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> service_account=<service_account> |
EXPERIMENTAL | - | |
| kube_pod_scheduler | Gauge | The scheduler for a pod | pod=<pod-name> namespace=<pod-namespace> uid=<pod-uid> name=<scheduler-name> |
EXPERIMENTAL | - |
Useful metrics queries
How to retrieve non-standard Pod state
It is not straightforward to get the Pod states for certain cases like "Terminating" and "Unknown" since it is not stored behind a field in the Pod.Status.
So to mimic the logic used by the kubectl command line, you will need to compose multiple metrics.
For example:
-
To get the list of pods that are in the
Unknownstate, you can run the following PromQL query:sum(kube_pod_status_phase{phase="Unknown"}) by (namespace, pod) or (count(kube_pod_deletion_timestamp) by (namespace, pod) * sum(kube_pod_status_reason{reason="NodeLost"}) by(namespace, pod)) -
For Pods in
Terminatingstate:count(kube_pod_deletion_timestamp) by (namespace, pod) * count(kube_pod_status_reason{reason="NodeLost"} == 0) by (namespace, pod)
Here is an example of a Prometheus rule that can be used to alert on a Pod that has been in the Terminating state for more than 5m.
groups:
- name: Pod state
rules:
- alert: PodsBlockedInTerminatingState
expr: count(kube_pod_deletion_timestamp) by (namespace, pod) * count(kube_pod_status_reason{reason="NodeLost"} == 0) by (namespace, pod) > 0
for: 5m
labels:
severity: page
annotations:
summary: Pod {{$labels.namespace}}/{{$labels.pod}} blocked in Terminating state.