semantic-conventions/docs/hardware/gpu.md

30 KiB

Semantic conventions for GPU metrics

Status: Development

GPU metrics hw.gpu.*

Graphics Processing Unit (discrete).

hw.type MUST be set to "gpu".

All GPU metrics may include the below attributes:

Attribute Type Description Examples Requirement Level Stability
hw.id string An identifier for the hardware component, unique within the monitored host win32battery_battery_testsysa33_1 Required Development
hw.driver_version string Driver version for the hardware component 10.2.1-3 Recommended Development
hw.firmware_version string Firmware version of the hardware component 2.0.1 Recommended Development
hw.model string Descriptive model name of the hardware component PERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 Battery Recommended Development
hw.name string An easily-recognizable name for the hardware component eth0 Recommended Development
hw.parent string Unique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller) dellStorage_perc_0 Recommended Development
hw.serial_number string Serial number of the hardware component CNFCP0123456789 Recommended Development
hw.vendor string Vendor name of the hardware component Dell; HP; Intel; AMD; LSI; Lenovo Recommended Development

Metric: hw.errors (GPU)

This metric is recommended.

Number of errors encountered by the GPU.

When using this metric, the following attributes MUST be set:

  • hw.type MUST be set to "gpu" to indicate that the errors are from a GPU.
  • error.type SHOULD be set to one of the following values to indicate the type of error:
    • "corrected": Errors that were detected and corrected by the GPU.
    • "uncorrected": Errors that were detected but could not be corrected by the GPU.
Name Instrument Type Unit (UCUM) Description Stability Entity Associations
hw.errors Counter {error} Number of errors encountered by the component. Development
Attribute Type Description Examples Requirement Level Stability
hw.id string An identifier for the hardware component, unique within the monitored host win32battery_battery_testsysa33_1 Required Development
hw.type string Type of the component [1] battery; cpu; disk_controller Required Development
error.type string The type of error encountered by the component. [2] uncorrected; zero_buffer_credit; crc; bad_sector Conditionally Required if and only if an error has occurred Stable
hw.name string An easily-recognizable name for the hardware component eth0 Recommended Development
hw.parent string Unique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller) dellStorage_perc_0 Recommended Development
network.io.direction string Direction of network traffic for network errors. [3] receive; transmit Recommended Development

[1] hw.type: Describes the category of the hardware component for which hw.state is being reported. For example, hw.type=temperature along with hw.state=degraded would indicate that the temperature of the hardware component has been reported as degraded.

[2] error.type: The error.type SHOULD match the error code reported by the component, the canonical name of the error, or another low-cardinality error identifier. Instrumentations SHOULD document the list of errors they report.

[3] network.io.direction: This attribute SHOULD only be used when hw.type is set to "network" to indicate the direction of the error.


error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
_OTHER A fallback error value to be used when the instrumentation doesn't define a custom value. Stable

hw.type has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
battery Battery Development
cpu CPU Development
disk_controller Disk controller Development
enclosure Enclosure Development
fan Fan Development
gpu GPU Development
logical_disk Logical disk Development
memory Memory Development
network Network Development
physical_disk Physical disk Development
power_supply Power supply Development
tape_drive Tape drive Development
temperature Temperature Development
voltage Voltage Development

network.io.direction has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
receive receive Development
transmit transmit Development

Metric: hw.gpu.io

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability Entity Associations
hw.gpu.io Counter By Received and transmitted bytes by the GPU. Development
Attribute Type Description Examples Requirement Level Stability
hw.id string An identifier for the hardware component, unique within the monitored host win32battery_battery_testsysa33_1 Required Development
network.io.direction string The network IO operation direction. receive; transmit Required Development
hw.driver_version string Driver version for the hardware component 10.2.1-3 Recommended Development
hw.firmware_version string Firmware version of the hardware component 2.0.1 Recommended Development
hw.model string Descriptive model name of the hardware component PERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 Battery Recommended Development
hw.name string An easily-recognizable name for the hardware component eth0 Recommended Development
hw.parent string Unique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller) dellStorage_perc_0 Recommended Development
hw.serial_number string Serial number of the hardware component CNFCP0123456789 Recommended Development
hw.vendor string Vendor name of the hardware component Dell; HP; Intel; AMD; LSI; Lenovo Recommended Development

network.io.direction has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
receive receive Development
transmit transmit Development

Metric: hw.gpu.memory.limit

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability Entity Associations
hw.gpu.memory.limit UpDownCounter By Size of the GPU memory. Development
Attribute Type Description Examples Requirement Level Stability
hw.id string An identifier for the hardware component, unique within the monitored host win32battery_battery_testsysa33_1 Required Development
hw.driver_version string Driver version for the hardware component 10.2.1-3 Recommended Development
hw.firmware_version string Firmware version of the hardware component 2.0.1 Recommended Development
hw.model string Descriptive model name of the hardware component PERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 Battery Recommended Development
hw.name string An easily-recognizable name for the hardware component eth0 Recommended Development
hw.parent string Unique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller) dellStorage_perc_0 Recommended Development
hw.serial_number string Serial number of the hardware component CNFCP0123456789 Recommended Development
hw.vendor string Vendor name of the hardware component Dell; HP; Intel; AMD; LSI; Lenovo Recommended Development

Metric: hw.gpu.memory.utilization

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability Entity Associations
hw.gpu.memory.utilization Gauge 1 Fraction of GPU memory used. Development
Attribute Type Description Examples Requirement Level Stability
hw.id string An identifier for the hardware component, unique within the monitored host win32battery_battery_testsysa33_1 Required Development
hw.driver_version string Driver version for the hardware component 10.2.1-3 Recommended Development
hw.firmware_version string Firmware version of the hardware component 2.0.1 Recommended Development
hw.model string Descriptive model name of the hardware component PERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 Battery Recommended Development
hw.name string An easily-recognizable name for the hardware component eth0 Recommended Development
hw.parent string Unique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller) dellStorage_perc_0 Recommended Development
hw.serial_number string Serial number of the hardware component CNFCP0123456789 Recommended Development
hw.vendor string Vendor name of the hardware component Dell; HP; Intel; AMD; LSI; Lenovo Recommended Development

Metric: hw.gpu.memory.usage

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability Entity Associations
hw.gpu.memory.usage UpDownCounter By GPU memory used. Development
Attribute Type Description Examples Requirement Level Stability
hw.id string An identifier for the hardware component, unique within the monitored host win32battery_battery_testsysa33_1 Required Development
hw.driver_version string Driver version for the hardware component 10.2.1-3 Recommended Development
hw.firmware_version string Firmware version of the hardware component 2.0.1 Recommended Development
hw.model string Descriptive model name of the hardware component PERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 Battery Recommended Development
hw.name string An easily-recognizable name for the hardware component eth0 Recommended Development
hw.parent string Unique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller) dellStorage_perc_0 Recommended Development
hw.serial_number string Serial number of the hardware component CNFCP0123456789 Recommended Development
hw.vendor string Vendor name of the hardware component Dell; HP; Intel; AMD; LSI; Lenovo Recommended Development

Metric: hw.gpu.utilization

This metric is recommended.

Name Instrument Type Unit (UCUM) Description Stability Entity Associations
hw.gpu.utilization Gauge 1 Fraction of time spent in a specific task. Development
Attribute Type Description Examples Requirement Level Stability
hw.id string An identifier for the hardware component, unique within the monitored host win32battery_battery_testsysa33_1 Required Development
hw.driver_version string Driver version for the hardware component 10.2.1-3 Recommended Development
hw.firmware_version string Firmware version of the hardware component 2.0.1 Recommended Development
hw.gpu.task string Type of task the GPU is performing decoder; encoder; general Recommended Development
hw.model string Descriptive model name of the hardware component PERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 Battery Recommended Development
hw.name string An easily-recognizable name for the hardware component eth0 Recommended Development
hw.parent string Unique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller) dellStorage_perc_0 Recommended Development
hw.serial_number string Serial number of the hardware component CNFCP0123456789 Recommended Development
hw.vendor string Vendor name of the hardware component Dell; HP; Intel; AMD; LSI; Lenovo Recommended Development

hw.gpu.task has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
decoder Decoder Development
encoder Encoder Development
general General Development

Metric: hw.status (GPU)

This metric is recommended.

Operational status: 1 (true) or 0 (false) for each of the possible states.

When using this metric for GPU status, the following attributes MUST be set:

  • hw.type MUST be set to "gpu" to indicate that the status is for a GPU.
  • hw.state MUST be set to one of the following values to indicate the GPU state:
    • "ok": The GPU is operating normally.
    • "degraded": The GPU is operating with reduced functionality or performance.
    • "failed": The GPU has failed and is not operational.
    • "predicted_failure": The GPU is currently operational but is predicted to fail soon.
Name Instrument Type Unit (UCUM) Description Stability Entity Associations
hw.status UpDownCounter 1 Operational status: 1 (true) or 0 (false) for each of the possible states. [1] Development

[1]: hw.status is currently specified as an UpDownCounter but would ideally be represented using a StateSet as defined in OpenMetrics. This semantic convention will be updated once StateSet is specified in OpenTelemetry. This planned change is not expected to have any consequence on the way users query their timeseries backend to retrieve the values of hw.status over time.

Attribute Type Description Examples Requirement Level Stability
hw.id string An identifier for the hardware component, unique within the monitored host win32battery_battery_testsysa33_1 Required Development
hw.state string The current state of the component degraded; failed; needs_cleaning Required Development
hw.type string Type of the component [1] battery; cpu; disk_controller Required Development
hw.name string An easily-recognizable name for the hardware component eth0 Recommended Development
hw.parent string Unique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller) dellStorage_perc_0 Recommended Development

[1] hw.type: Describes the category of the hardware component for which hw.state is being reported. For example, hw.type=temperature along with hw.state=degraded would indicate that the temperature of the hardware component has been reported as degraded.


hw.state has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
degraded Degraded Development
failed Failed Development
needs_cleaning Needs Cleaning Development
ok OK Development
predicted_failure Predicted Failure Development

hw.type has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

Value Description Stability
battery Battery Development
cpu CPU Development
disk_controller Disk controller Development
enclosure Enclosure Development
fan Fan Development
gpu GPU Development
logical_disk Logical disk Development
memory Memory Development
network Network Development
physical_disk Physical disk Development
power_supply Power supply Development
tape_drive Tape drive Development
temperature Temperature Development
voltage Voltage Development