Update metrics doc (#350)

Signed-off-by: Peng Gao <peng.gao.dut@gmail.com>
This commit is contained in:
Peng Gao 2021-04-13 20:06:03 +08:00 committed by GitHub
parent 424088cef4
commit 1ff487d111
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 14 additions and 0 deletions

View File

@ -200,6 +200,20 @@ Variables: horovod
total images/sec: 308.27
```
## Exposed Metrics
| Metric name | Metric type | Description | Labels |
| ----------- | ----------- | ----------- | ------ |
|mpi\_operator\_jobs\_created\_total | Counter | Counts number of MPI jobs created | |
|mpi\_operator\_jobs\_successful\_total | Counter | Counts number of MPI jobs successful | |
|mpi\_operator\_jobs\_failed\_total | Counter | Counts number of MPI jobs failed| |
|mpi\_operator\_job\_info | Gauge | Information about MPIJob | `launcher`=&lt;launcher-pod-name&gt; <br> `namespace`=&lt;job-namespace&gt; |
### Join Metrics
With [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics), one can join metrics by labels.
For example `kube_pod_info * on(pod,namespace) group_left label_replace(mpi_operator_job_infos, "pod", "$0", "launcher", ".*")`
# Docker Images
Docker images are built and pushed automatically to [mpioperator on Dockerhub](https://hub.docker.com/u/mpioperator). You can use the following Dockerfiles to build the images yourself: