Update Spark Operator Prometheus Metrics Guide (#3983)

Signed-off-by: Ratnopam Chakrabarti <ratnopamc@yahoo.com>
This commit is contained in:
Ratnopam Charabarti 2025-02-11 03:27:29 -08:00 committed by GitHub
parent 39839caf8a
commit 51e4aeb21c
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 19 additions and 19 deletions

View File

@ -228,30 +228,30 @@ If enabled, the operator generates the following metrics:
| Metric | Description |
| ------------- | ------------- |
| `spark_app_count` | Total number of SparkApplication handled by the Operator.|
| `spark_app_submit_count` | Total number of SparkApplication spark-submitted by the Operator.|
| `spark_app_success_count` | Total number of SparkApplication which completed successfully.|
| `spark_app_failure_count` | Total number of SparkApplication which failed to complete. |
| `spark_app_running_count` | Total number of SparkApplication which are currently running.|
| `spark_app_success_execution_time_microseconds` | Execution time for applications which succeeded.|
| `spark_app_failure_execution_time_microseconds` | Execution time for applications which failed. |
| `spark_app_start_latency_microseconds` | Start latency of SparkApplication as type of [Prometheus Summary](https://prometheus.io/docs/concepts/metric_types/#summary). |
| `spark_app_start_latency_seconds` | Start latency of SparkApplication as type of [Prometheus Histogram](https://prometheus.io/docs/concepts/metric_types/#histogram). |
| `spark_app_executor_success_count` | Total number of Spark Executors which completed successfully. |
| `spark_app_executor_failure_count` | Total number of Spark Executors which failed. |
| `spark_app_executor_running_count` | Total number of Spark Executors which are currently running. |
| `spark_application_count` | Total number of SparkApplication handled by the Operator.|
| `spark_application_submit_count` | Total number of SparkApplication spark-submitted by the Operator.|
| `spark_application_success_count` | Total number of SparkApplication which completed successfully.|
| `spark_application_failure_count` | Total number of SparkApplication which failed to complete. |
| `spark_application_running_count` | Total number of SparkApplication which are currently running.|
| `spark_application_success_execution_time_seconds` | Execution time for applications which succeeded.|
| `spark_application_failure_execution_time_seconds` | Execution time for applications which failed. |
| `spark_application_start_latency_seconds` | Start latency of SparkApplication as type of [Prometheus Summary](https://prometheus.io/docs/concepts/metric_types/#summary). |
| `spark_application_start_latency_seconds` | Start latency of SparkApplication as type of [Prometheus Histogram](https://prometheus.io/docs/concepts/metric_types/#histogram). |
| `spark_executor_success_count` | Total number of Spark Executors which completed successfully. |
| `spark_executor_failure_count` | Total number of Spark Executors which failed. |
| `spark_executor_running_count` | Total number of Spark Executors which are currently running. |
#### Work Queue Metrics
| Metric | Description |
| ------------- | ------------- |
| `spark_application_controller_depth` | Current depth of workqueue |
| `spark_application_controller_adds` | Total number of adds handled by workqueue |
| `spark_application_controller_latency` | Latency for workqueue |
| `spark_application_controller_work_duration` | How long processing an item from workqueue takes |
| `spark_application_controller_retries` | Total number of retries handled by workqueue |
| `spark_application_controller_unfinished_work_seconds` | Unfinished work in seconds |
| `spark_application_controller_longest_running_processor_microseconds` | Longest running processor in microseconds |
| `workqueue_depth` | Current depth of workqueue |
| `workqueue_adds_total` | Total number of adds handled by workqueue |
| `workqueue_queue_duration_seconds_bucket` | How long in seconds an item stays in workqueue before being requested |
| `workqueue_work_duration_seconds_bucket` | How long in seconds processing an item from workqueue takes |
| `workqueue_retries_total` | Total number of retries handled by workqueue |
| `workqueue_unfinished_work_seconds` | Unfinished work in seconds |
| `workqueue_longest_running_processor_seconds` | Longest running processor in seconds |
The following is a list of all the configurations the operators supports for metrics: