From 1ff487d111d4c704bf17050123ef0dc38ab2dde2 Mon Sep 17 00:00:00 2001 From: Peng Gao Date: Tue, 13 Apr 2021 20:06:03 +0800 Subject: [PATCH] Update metrics doc (#350) Signed-off-by: Peng Gao --- README.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/README.md b/README.md index 954e052..bec9085 100644 --- a/README.md +++ b/README.md @@ -200,6 +200,20 @@ Variables: horovod total images/sec: 308.27 ``` +## Exposed Metrics + +| Metric name | Metric type | Description | Labels | +| ----------- | ----------- | ----------- | ------ | +|mpi\_operator\_jobs\_created\_total | Counter | Counts number of MPI jobs created | | +|mpi\_operator\_jobs\_successful\_total | Counter | Counts number of MPI jobs successful | | +|mpi\_operator\_jobs\_failed\_total | Counter | Counts number of MPI jobs failed| | +|mpi\_operator\_job\_info | Gauge | Information about MPIJob | `launcher`=<launcher-pod-name>
`namespace`=<job-namespace> | + +### Join Metrics + +With [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics), one can join metrics by labels. +For example `kube_pod_info * on(pod,namespace) group_left label_replace(mpi_operator_job_infos, "pod", "$0", "launcher", ".*")` + # Docker Images Docker images are built and pushed automatically to [mpioperator on Dockerhub](https://hub.docker.com/u/mpioperator). You can use the following Dockerfiles to build the images yourself: