website/content/docs/notebooks/setup.md

263 lines
11 KiB
Markdown

+++
title = "Set Up Your Notebooks"
description = "Getting started with Jupyter notebooks on Kubeflow"
weight = 10
+++
Your Kubeflow deployment includes services for spawning and managing Jupyter
notebooks.
You can set up multiple *notebook servers* per Kubeflow deployment. Each
notebook server can include multiple *notebooks*. Each notebook server belongs
to a single *namespace*, which corresponds to the project group or team for that
server.
This guide shows you how to set up a notebook server for your Jupyter notebooks
in Kubeflow.
## Quick guide
Summary of steps:
1. Follow the [Kubeflow getting-started guide](/docs/started/getting-started/) to
set up your Kubeflow deployment and open the Kubeflow UI.
1. Click **Notebooks** in the left-hand panel of the Kubeflow UI.
1. Click **NEW SERVER** to create a notebook server.
1. When the notebook server provisioning is complete, click **CONNECT**.
1. Click **Upload** to upload an existing notebook, or click **New** to
create an empty notebook.
The rest of this page contains details of the above steps.
## Install Kubeflow and open the Kubeflow UI
Follow the [Kubeflow getting-started guide](/docs/started/getting-started/) to
set up your Kubeflow deployment in your environment of choice (locally, on
premises, or in the cloud).
When Kubeflow is running, access the Kubeflow UI as described in the
getting-started guide for your chosen environment. For example:
* If you deployed Kubeflow on Google Cloud Platform (GCP), the Kubeflow UI is
available at the following URI:
```
https://<deployment_name>.endpoints.<project>.cloud.goog/
```
* If you set up port forwarding to the Ambassador service, the Kubeflow UI is
available at the following URI:
```
http://localhost:8080/
```
* For other environments, see the getting-started guide for your chosen
environment.
## Create a Jupyter notebook server and add a notebook
1. Click **Notebooks** in the left-hand panel of the Kubeflow UI to access the
Jupyter notebook services deployed with Kubeflow:
<img src="/docs/images/jupyterlink.png"
alt="Opening notebooks from the Kubeflow UI"
class="mt-3 mb-3 border border-info rounded">
1. Sign in:
* On GCP, sign in using your Google Account. (If you have already logged in
to your Google Account you may not need to log in again.)
* On all other platforms, sign in using any username and password.
1. Click **NEW SERVER** on the **Notebook Servers** page:
<img src="/docs/images/add-notebook-server.png"
alt="The Kubeflow notebook servers page"
class="mt-3 mb-3 border border-info rounded">
You should see the **New Notebook Server** page:
<img src="/docs/images/new-notebook-server.png"
alt="Form for adding a Kubeflow notebook server"
class="mt-3 mb-3 border border-info rounded">
1. Enter a **name** of your choice for the notebook server. The name can
include letters and numbers, but no spaces. For example, `my-first-notebook`.
1. Enter a **namespace** to identify the project group or team to which this
notebook server belongs. The default is `kubeflow`.
1. Select a Docker **image** for the baseline deployment of your notebook
server. You can choose from a range of *standard* images or specify a
*custom* image:
* **Standard**: The standard Docker images include typical machine learning
(ML) packages that you can use within your Jupyter notebooks on
this notebook server. Select an image from the **Image** dropdown menu.
The image names indicate the following choices:
* A TensorFlow version (for example, `tensorflow-1.13.1`). Kubeflow offers
a CPU and a GPU image for each minor version of TensorFlow.
* `cpu` or `gpu`, depending on whether you want to train your model on a CPU
or a GPU.
* If you choose a GPU image, make sure that you have GPUs
available in your Kubeflow cluster. Run the following command to check
if there are any GPUs available:
`kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"`
* If you have GPUs available, you can schedule your server on a GPU node
in the **Extra Resources** section at the bottom of the form. For
example, to reserve two GPUs, enter the following JSON code:
`{"nvidia.com/gpu": 2}`
* Kubeflow version (for example, `v0.5.0`).
* **Custom**: If you select the custom option, you must specify a Docker image
in the form `registry/image:tag`. For guidelines on creating a Docker
image for your notebook, see the guide to
[creating a custom Jupyter image](/docs/notebooks/custom-notebook/).
1. Specify the total amount of **CPU** that your notebook server should reserve.
The default is `0.5`. For CPU-intensive jobs, you can choose more than one CPU
(for example, `1.5`).
1. Specify the total amount of **memory** (RAM) that your notebook server should
reserve. The default is `1.0Gi`.
1. Specify a **workspace volume** to hold your personal workspace for this
notebook server. Kubeflow provisions a
[Kubernetes persistent volume (PV)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) for your workspace volume. The PV ensures that you can
retain data even if you destroy your notebook server.
* The default is to create a new volume for your workspace with the
following configuration:
* Name: The volume name is synced with the name of the notebook server.
When you start typing the notebook server name, the volume name takes
the same value. You can edit the volume name, but if you later edit the
notebook server name, the volume name changes to match the notebook
server name.
* Size: `10Gi`
* Mount path: `/home/jovyan`
* Access mode: `ReadWriteOnce`. This setting means that the volume can be
mounted as read-write by a single node. See the
[Kubernetes documentation](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) for more details about access modes.
* Alternatively, you can point the notebook server at an existing volume by
specifying the name, mount path, and access mode for the existing volume.
1. *(Optional)* Specify one or more **data volumes** if you want to store and
access data from the notebooks on this notebook server. You can add new
volumes or specify existing volumes. Kubeflow provisions a
[Kubernetes persistent volume (PV)](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) for each of your data volumes.
1. Click **SPAWN** and wait a while. You should see an entry for your new
notebook server on the **Notebook Servers** page, with a spinning indicator in
the **Status** column. It can take a few minutes to set up
the notebook server.
* You can check the status of your Pod by hovering your mouse cursor over
the icon in the **Status** column next to the entry for your notebook
server. For example, if the image is downloading then the status spinner
has a tooltip that says `ContainerCreating`.
Alternatively, you can check the Pod status by entering the following
command:
```
kubectl -n <NAMESPACE> describe pods jupyter-<USERNAME>
```
Where `<NAMESPACE>` is the namespace you specified earlier
(default `kubeflow`) and `<USERNAME>` is the name you used to log in.
**A note for GCP users:** If you have IAP turned on, the Pod has
a different name. For example, if you signed in as `USER@DOMAIN.EXT`
the Pod has a name of the following form:
```
jupyter-accounts-2egoogle-2ecom-3USER-40DOMAIN-2eEXT
```
1. When the notebook server provisioning is complete, you should see an entry
for your server on the **Notebook Servers** page, with a check mark in the
**Status** column:
<img src="/docs/images/notebook-servers.png"
alt="Opening notebooks from the Kubeflow UI"
class="mt-3 mb-3 border border-info rounded">
1. Click **CONNECT** to start the notebook server.
1. When the notebook server is running, you should see the Jupyter dashboard
interface. If you requested a new workspace, the dashboard should be empty
of notebooks:
<img src="/docs/images/jupyter-dashboard.png"
alt="Jupyter dashboard with no notebooks"
class="mt-3 mb-3 border border-info rounded">
1. Click **Upload** to upload an existing notebook, or click **New** to
create an empty notebook. You can read about using notebooks in the
[Jupyter documentation](https://jupyter-notebook.readthedocs.io/en/latest/notebook.html#notebook-user-interface).
## Experiment with your notebook
The default notebook image includes all the plugins that you need to train a
TensorFlow model with Jupyter, including
[Tensorboard](https://www.tensorflow.org/get_started/summaries_and_tensorboard)
for rich visualizations and insights into your model.
To test your Jupyter installation, you can run a basic 'hello world' program
(adapted from
[mnist_softmax.py](https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/tutorials/mnist/mnist_softmax.py)) as follows:
1. Use the Jupyter dashboard to create a new **Python 3** notebook.
1. Copy the following code and paste it into a code block in your notebook:
```
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
for _ in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("Accuracy: ", sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
```
1. Run the code. You should see a number of `WARNING` messages from TensorFlow,
followed by a line showing a training accuracy something like this:
```
Accuracy: 0.9012
```
Please note that when running on most cloud providers, the public IP address is
exposed to the internet and is an unsecured endpoint by default.
## Next steps
* Explore [Kubeflow Fairing](/docs/fairing/) for a complete solution to
building, training, and deploying an ML model from a notebook.
* Learn the advanced features available from a Kubeflow notebook, such as
[submitting Kubernetes resources](/docs/notebooks/submit-kubernetes/) or
[building Docker images](/docs/notebooks/submit-docker-image/).