mirror of https://github.com/kubeflow/examples.git
* Refactor Python module: - Replace MPI by GLOO as backend to avoid having to recompily Pytorch - Replace DistributedDataParallel() class with official version when using GPUs - Remove unnecessary method to disable logs in workers - Refactor run() * Simplify Dockerfile by using Pytorch 0.4 official image with Cuda and remove mpirun call |
||
|---|---|---|
| .. | ||
| cpu | ||
| gpu | ||
| Dockerfile.traincpu | ||
| Dockerfile.traingpu | ||
| build_image.sh | ||
| mnist_DDP.py | ||