mirror of https://github.com/kubeflow/website.git
Update Spark Operator Prometheus Metrics Guide (#3983)
Signed-off-by: Ratnopam Chakrabarti <ratnopamc@yahoo.com>
This commit is contained in:
parent
39839caf8a
commit
51e4aeb21c
|
|
@ -228,30 +228,30 @@ If enabled, the operator generates the following metrics:
|
|||
|
||||
| Metric | Description |
|
||||
| ------------- | ------------- |
|
||||
| `spark_app_count` | Total number of SparkApplication handled by the Operator.|
|
||||
| `spark_app_submit_count` | Total number of SparkApplication spark-submitted by the Operator.|
|
||||
| `spark_app_success_count` | Total number of SparkApplication which completed successfully.|
|
||||
| `spark_app_failure_count` | Total number of SparkApplication which failed to complete. |
|
||||
| `spark_app_running_count` | Total number of SparkApplication which are currently running.|
|
||||
| `spark_app_success_execution_time_microseconds` | Execution time for applications which succeeded.|
|
||||
| `spark_app_failure_execution_time_microseconds` | Execution time for applications which failed. |
|
||||
| `spark_app_start_latency_microseconds` | Start latency of SparkApplication as type of [Prometheus Summary](https://prometheus.io/docs/concepts/metric_types/#summary). |
|
||||
| `spark_app_start_latency_seconds` | Start latency of SparkApplication as type of [Prometheus Histogram](https://prometheus.io/docs/concepts/metric_types/#histogram). |
|
||||
| `spark_app_executor_success_count` | Total number of Spark Executors which completed successfully. |
|
||||
| `spark_app_executor_failure_count` | Total number of Spark Executors which failed. |
|
||||
| `spark_app_executor_running_count` | Total number of Spark Executors which are currently running. |
|
||||
| `spark_application_count` | Total number of SparkApplication handled by the Operator.|
|
||||
| `spark_application_submit_count` | Total number of SparkApplication spark-submitted by the Operator.|
|
||||
| `spark_application_success_count` | Total number of SparkApplication which completed successfully.|
|
||||
| `spark_application_failure_count` | Total number of SparkApplication which failed to complete. |
|
||||
| `spark_application_running_count` | Total number of SparkApplication which are currently running.|
|
||||
| `spark_application_success_execution_time_seconds` | Execution time for applications which succeeded.|
|
||||
| `spark_application_failure_execution_time_seconds` | Execution time for applications which failed. |
|
||||
| `spark_application_start_latency_seconds` | Start latency of SparkApplication as type of [Prometheus Summary](https://prometheus.io/docs/concepts/metric_types/#summary). |
|
||||
| `spark_application_start_latency_seconds` | Start latency of SparkApplication as type of [Prometheus Histogram](https://prometheus.io/docs/concepts/metric_types/#histogram). |
|
||||
| `spark_executor_success_count` | Total number of Spark Executors which completed successfully. |
|
||||
| `spark_executor_failure_count` | Total number of Spark Executors which failed. |
|
||||
| `spark_executor_running_count` | Total number of Spark Executors which are currently running. |
|
||||
|
||||
#### Work Queue Metrics
|
||||
|
||||
| Metric | Description |
|
||||
| ------------- | ------------- |
|
||||
| `spark_application_controller_depth` | Current depth of workqueue |
|
||||
| `spark_application_controller_adds` | Total number of adds handled by workqueue |
|
||||
| `spark_application_controller_latency` | Latency for workqueue |
|
||||
| `spark_application_controller_work_duration` | How long processing an item from workqueue takes |
|
||||
| `spark_application_controller_retries` | Total number of retries handled by workqueue |
|
||||
| `spark_application_controller_unfinished_work_seconds` | Unfinished work in seconds |
|
||||
| `spark_application_controller_longest_running_processor_microseconds` | Longest running processor in microseconds |
|
||||
| `workqueue_depth` | Current depth of workqueue |
|
||||
| `workqueue_adds_total` | Total number of adds handled by workqueue |
|
||||
| `workqueue_queue_duration_seconds_bucket` | How long in seconds an item stays in workqueue before being requested |
|
||||
| `workqueue_work_duration_seconds_bucket` | How long in seconds processing an item from workqueue takes |
|
||||
| `workqueue_retries_total` | Total number of retries handled by workqueue |
|
||||
| `workqueue_unfinished_work_seconds` | Unfinished work in seconds |
|
||||
| `workqueue_longest_running_processor_seconds` | Longest running processor in seconds |
|
||||
|
||||
The following is a list of all the configurations the operators supports for metrics:
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue