examples/FaceNet-distributed-training/step2_NFS_setup.md

2.7 KiB
Raw Blame History

Network File System Setup

In this part, you need to install Network File System (NFS) and setup your NFS server. This is very helpful for the subsequent steps, because distributed training will finally return the trained data to the NFS folder, and after all machines are trained, they will be consolidated into a weight file.

Install Network File System (NFS)

Install the NFS server on your server, and install the NFS command on your client.

//NFS Server
sudo apt-get install nfs-kernel-server

//NFS Client
sudo apt-get install nfs-common

On the server and client, you need to create the mount directory and give permission.

//NFS Server
 mkdir  p  /mnt/ (your mount directory folder name on the server)
 chmod  -R 777 /mnt/ (your mount directory folder name on the server)
 
//NFS Client
 mkdir  p  /mnt/ (your mount directory folder name on the client)
 chmod  -R 777 /mnt/ (your mount directory folder name on the client)

Add the mount directory location and IP address of the client computer and server under /etc/exports path, as shown in Figure.1.

 cd /etc
 sudo gedit exports

Figure1. Example of export data.

Finally, turn on your NFS server and check your NFS system status, as shown in Figure.2, if the NFS server displays a green light, it means it has successfully started.

sudo service nfs-server start
sudo service nfs-server status

Figure2. Example of NFS server status.

After the previous settings, run the following command and you can mount files from the server on the client side, as shown in Figure.3.

 sudo mount -t nfs (NFS Server IP):/mnt/(your mount directory folder name on the server)  /mnt/(your mount directory folder name on the client) -o nolock

Figure3. Example of show mount directory.

Previous: Create a new cluster and deploy the Kubeflow on local Kubernetes

Next: Setup storageclass and PVC