Update document for GPU support

The current guide is created two years ago and the content is out
of date.
This commit is contained in:
Yujun Zhang 2018-12-21 15:08:07 +08:00
parent 7af77bb79c
commit 3b21a193d0
1 changed files with 3 additions and 72 deletions

View File

@ -1,74 +1,5 @@
# GPU support
# GPU Support
```
kops create cluster gpu.example.com --zones us-east-1c --node-size p2.xlarge --node-count 1 --kubernetes-version 1.6.1
```
You can use [kops hooks](./cluster_spec.md#hooks) to install [Nvidia kubernetes device plugin](https://github.com/NVIDIA/k8s-device-plugin) and enable GPU support in cluster.
(Note that the p2.xlarge instance type is not cheap, but no GPU instances are)
You can use the experimental hooks feature to install the nvidia drivers:
`> kops edit cluster gpu.example.com`
```
spec:
...
hooks:
- execContainer:
image: kopeio/nvidia-bootstrap:1.6
```
(TODO: Only on instance groups, or have nvidia-bootstrap detect if GPUs are present..)
In addition, you will likely want to set the `Accelerators=true` feature-flag to kubelet:
`> kops edit cluster gpu.example.com`
```
spec:
...
kubelet:
featureGates:
Accelerators: "true"
```
`> kops update cluster gpu.example.com --yes`
Here is an example pod that runs tensorflow; note that it mounts libcuda from the host:
(TODO: Is there some way to have a well-known volume or similar?)
```
apiVersion: v1
kind: Pod
metadata:
name: tf
spec:
containers:
- image: gcr.io/tensorflow/tensorflow:1.0.1-gpu
imagePullPolicy: IfNotPresent
name: gpu
command:
- /bin/bash
- -c
- "cp -d /rootfs/usr/lib/x86_64-linux-gnu/libcuda.* /usr/lib/x86_64-linux-gnu/ && cp -d /rootfs/usr/lib/x86_64-linux-gnu/libnvidia* /usr/lib/x86_64-linux-gnu/ &&/run_jupyter.sh"
resources:
limits:
cpu: 2000m
alpha.kubernetes.io/nvidia-gpu: 1
volumeMounts:
- name: rootfs-usr-lib
mountPath: /rootfs/usr/lib
volumes:
- name: rootfs-usr-lib
hostPath:
path: /usr/lib
```
To use this particular tensorflow image, you should port-forward and get the URL from the log:
```
kubectl port-forward tf 8888 &
kubectl logs tf
```
And browse to the URL printed
See instructions in [kops hooks for nvidia-device-plugin](../hooks/nvidia-device-plugin).