Commit Graph

5 Commits

Author SHA1 Message Date
Mateusz Kubica 21f326d1d2
MPICH support (#562)
* Add support for MPICH

* Fix CI errors

* Temporary: manual trigger

* Fix file name

* Add an empty line at the end of the file

* Fix formatting

* Revert "Temporary: manual trigger"

This reverts commit 15164a8b70.

* fix formatting

* Regenerate the mpi-operator.yaml

* Adding an empy line at the end of Dockerfiles

* Share the same entrypoin for Intel and MPICH

* share hostfile generation between Intel and MPICH

* Add validation test for MPICH

* Fix formatting

* Don't over engineer the tests - be explicit

* add non-root tests for IntelMPI and MPICH
2023-06-16 17:57:36 +00:00
HeGaoYuan 18b1822f4a
typo on proposals/elastic-horovod.md (#430) 2021-10-08 04:24:23 -07:00
Aldo Culquicondor 39d2108515
Propose a new architecture with focus on scalability and robustness (#360)
* Propose a new architecture with focus con scalability and robustness

* incorporate comments and fix assumptions about security
2021-06-21 10:34:54 -07:00
junfan.zhang d56bdb56d0
Fix typo (#349) 2021-04-21 14:50:22 +08:00
Wang Zhang f788e75925
propose elastic training with horovod with mpi-operator (#335) 2021-03-11 22:21:25 -08:00