mirror of https://github.com/kubeflow/trainer.git
* feat(sdk): Support MPI-based TrainJobs Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Refactor list_runtimes Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Fix example Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Add Runtime Trainer object Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Update for new Runtime object Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Implement get_runtime API Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Fix Torch example Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Remove un-unsed consts Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Update func args Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Update SDK constants Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Change to 16Gi Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Fix container name for MPI Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Keep launcher container for MPI Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> |
||
---|---|---|
.. | ||
mnist.ipynb |