Update demo script & add notebook (#248)

* Update demo script Update demo script to include deploy script and notebook created by @drscott173 Simplify by removing unnecessary commands Use default namespace instead of kubeflow * Add yelp notebook readme * Add cluster creation commands Add instructions for highlighting changes resulting from each command
2018-09-11 14:17:02 -04:00 · 2018-09-11 14:17:02 -04:00 · 42592fed4a
parent 8e30631c54
commit 42592fed4a
8 changed files with 1419 additions and 124 deletions
--- a/demos/yelp_demo/README.md
+++ b/demos/yelp_demo/README.md
@ -6,75 +6,77 @@ presentation to public audiences.
 The base demo includes the following steps:

 1. [Install kubeflow locally](#1-install-kubeflow-locally)
-1. [Bring up a notebook](#2-bring-up-a-notebook)
-1. [Run training locally](#3-run-training-locally)
-1. [Install kubeflow on GKE](#4-install-kubeflow-on-gke)
-1. [Run training on GKE](#5-run-training-on-gke)
-1. [Create the serving and UI components](#6-create-the-serving-and-ui-components)
+1. [Run training locally](#2-run-training-locally)
+1. [Install kubeflow on GKE](#3-install-kubeflow-on-gke)
+1. [Run training on GKE](#4-run-training-on-gke)
+1. [Create the serving and UI components](#5-create-the-serving-and-ui-components)
+1. [Bring up a notebook](#6-bring-up-a-notebook)

 ## Important! Pre-work

-Before completing any of the below steps, follow the instructions in [./demo_setup](./demo_setup/README.md) to prepare for demonstrating Kubeflow.
+Before completing any of the below steps, follow the instructions in
+[./demo_setup](https://github.com/kubeflow/examples/blob/master/demos/yelp_demo/demo_setup/README.md)
+to prepare for demonstrating Kubeflow:
+
+* [Create a minikube cluster](https://github.com/kubeflow/examples/blob/master/demos/yelp_demo/demo_setup/README.md#4-create-a-minikube-cluster)
+* [Create a GKE cluster](https://github.com/kubeflow/examples/blob/master/demos/yelp_demo/demo_setup/README.md#5-create-a-gke-cluster)
+
+## 1. Create clusters
+
+[Setup your environment](https://github.com/kubeflow/examples/tree/master/demos/yelp_demo/demo_setup#2-set-environment-variables)
+or source the base file:
+
+```
+cd demo_setup
+source kubeflow-demo-base.env
+```
+
+Create a minikube cluster:
+
+```
+minikube start \
+  --cpus 4 \
+  --memory 8096 \
+  --disk-size=50g \
+  --kubernetes-version v1.10.6
+```
+
+Create a GKE cluster with access to GPUs and TPUs:
+
+```
+gcloud beta container clusters create ${CLUSTER} \
+  --project ${DEMO_PROJECT} \
+  --zone ${ZONE} \
+  --accelerator type=nvidia-tesla-k80,count=2 \
+  --cluster-version 1.10.6-gke.2 \
+  --enable-ip-alias \
+  --enable-tpu \
+  --machine-type n1-highmem-8 \
+  --scopes cloud-platform,compute-rw,storage-rw \
+  --verbosity error
+```

 ## 1. Install kubeflow locally

-Initialize a ksonnet app:
+Run the following script to create a ksonnet app for Kubeflow and deploy it:

 ```
-ks init kubeflow
-cd kubeflow
+export KUBEFLOW_VERSION=0.2.5
+curl https://raw.githubusercontent.com/kubeflow/kubeflow/v${KUBEFLOW_VERSION}/scripts/deploy.sh | bash
 ```

-Install packages and generate core components:
+View the installed components:

 ```
-ks registry add kubeflow github.com/kubeflow/kubeflow/tree/${VERSION}/kubeflow
-ks pkg install kubeflow/core@${VERSION}
-ks pkg install kubeflow/tf-serving@${VERSION}
-ks pkg install kubeflow/tf-job@${VERSION}
-ks generate core kubeflow-core --name=kubeflow-core
+kubectl get pod
 ```

-Create the minikube environment and set the cloud parameter:
-
-```
-ks env add minikube --namespace=${NAMESPACE}
-ks param set --env minikube kubeflow-core \
-  cloud "minikube"
-```
-
-Apply kubeflow to the cluster:
-
-```
-ks apply minikube -c kubeflow-core
-```
-
-## 2. Bring up a notebook
-
-Connect to Jupyterhub by forwarding a port and opening a browser to
-[localhost:8000](localhost:8000):
-
-```
-kubectl port-forward tf-hub-0 8000:8000
-```
-
-Spawn a new pod with this image:
-
-```
-gcr.io/kubeflow-dev/issue-summarization-notebook-cpu:latest
-```
-
-Once the notebook environment is
-available, open a new terminal and upload this [Simple ML Model
-notebook](notebooks/simple_ml_model.ipynb).
-
-Execute the notebook to show that it works.
-
-## 3. Run training locally
+## 2. Run training locally

 Retrieve the following files for the t2tcpu & t2ttpu jobs:

 ```
+cd kubeflow_ks_app
 cp ${REPO_PATH}/demo/components/t2t[ct]pu.* components
 cp ${REPO_PATH}/demo/components/params.* components
 ```
@ -82,43 +84,42 @@ cp ${REPO_PATH}/demo/components/params.* components
 Set parameter values for training:

 ```
-ks param set --env minikube t2tcpu \
+ks param set --env default t2tcpu \
  cloud "minikube"
-ks param set --env minikube t2tcpu \
-  dataDir ${GCS_TRAINING_DATA_DIR}
-ks param set --env minikube t2tcpu \
-  outputGCSPath ${GCS_TRAINING_OUTPUT_DIR_LOCAL}
-ks param set --env minikube t2tcpu \
-  cpuImage gcr.io/${DEMO_PROJECT}/kubeflow-yelp-demo-cpu:latest
-ks param set --env minikube t2tcpu \
+ks param set --env default t2tcpu \
  workers 2
 ```

 Generate manifests and apply to cluster:

 ```
-ks apply minikube -c t2tcpu
+ks apply default -c t2tcpu
 ```

-## 4. Install kubeflow on GKE
+View the new training pod and wait until it has a `Running` status:
+
+```
+kubectl get pod
+```
+
+View the logs to watch training commence:
+
+```
+kubectl logs -f t2tcpu-master-0 | grep INFO:tensorflow
+```
+
+## 3. Install kubeflow on GKE

 Switch to a GKE cluster:

 ```
-kubectl config use-context gke
+kubectl config use gke
 ```

 Create an environment:

 ```
-ks env add gke --namespace=${NAMESPACE}
-```
-
-Set parameter values for kubeflow-core:
-
-```
-ks param set --env gke kubeflow-core \
-  cloud "gke"
+ks env add gke
 ```

 Install kubeflow on the cluster:
@ -127,15 +128,19 @@ Install kubeflow on the cluster:
 ks apply gke -c kubeflow-core
 ```

-## 5. Run training on GKE
+View the installed components:
+
+```
+kubectl get pod
+```
+
+## 4. Run training on GKE

 ### Distributed CPU training

 Set parameter values for training:

 ```
-ks param set --env gke t2tcpu \
-  dataDir ${GCS_TRAINING_DATA_DIR}
 ks param set --env gke t2tcpu \
  outputGCSPath ${GCS_TRAINING_OUTPUT_DIR_CPU}
 ```
@ -147,19 +152,16 @@ above 1000.
 ks apply gke -c t2tcpu
 ```

-#### Export the trained model
-
-This will export the model to an `export/` directory in output_dir.
+View the new training pod and wait until it has a `Running` status:

 ```
-cd ../yelp
-t2t-exporter \
-  --t2t_usr_dir=${USR_DIR} \
-  --model=${MODEL} \
-  --hparams_set=${HPARAMS_SET} \
-  --problem=${PROBLEM} \
-  --data_dir=${GCS_TRAINING_DATA_DIR} \
-  --output_dir=${GCS_TRAINING_OUTPUT_DIR_CPU}
+kubectl get pod
+```
+
+View the logs to watch training commence:
+
+```
+kubectl logs -f t2tcpu-master-0 | grep INFO:tensorflow
 ```

 ### Distributed TPU training
@ -167,8 +169,6 @@ t2t-exporter \
 Set parameter values for training:

 ```
-ks param set --env gke t2ttpu \
-  dataDir ${GCS_TRAINING_DATA_DIR}
 ks param set --env gke t2ttpu \
  outputGCSPath ${GCS_TRAINING_OUTPUT_DIR_TPU}
 ```
@ -180,22 +180,21 @@ above 1000.
 ks apply gke -c t2ttpu
 ```

-#### Export the trained model
-
-This will export the model to an `export/` directory in output_dir.
+Verify that a TPU is being provisioned by viewing pod status. It should remain
+in Pending state for 3-4 minutes with the message
+`Creating Cloud TPUs for pod default/t2ttpu-master-0`.

 ```
-cd ../yelp
-t2t-exporter \
-  --t2t_usr_dir=${USR_DIR} \
-  --model=${MODEL} \
-  --hparams_set=${HPARAMS_SET} \
-  --problem=${PROBLEM} \
-  --data_dir=${GCS_TRAINING_DATA_DIR} \
-  --output_dir=${GCS_TRAINING_OUTPUT_DIR_TPU}
+kubectl describe pod t2ttpu-master-0
 ```

-## 6. Create the serving and UI components
+Once it has `Running` status, view the logs to watch training commence:
+
+```
+kubectl logs -f t2ttpu-master-0 | grep INFO:tensorflow
+```
+
+## 5. Create the serving and UI components

 Retrieve the following files for the serving & UI components:

@ -204,14 +203,6 @@ cp ${REPO_PATH}/demo/components/serving.* components
 cp ${REPO_PATH}/demo/components/ui.* components
 ```

-
-Set parameter values for serving:
-
-```
-ks param set --env gke serving \
-  modelPath ${GCS_TRAINING_OUTPUT_DIR_TPU}/export/Servo
-```
-
 Create the serving and UI components:

 ```
@ -228,8 +219,8 @@ UI_POD=$(kubectl get po -l app=kubeflow-demo-ui | \
 kubectl port-forward ${UI_POD} 8080:80
 ```

-Optional: Setup an SSH tunnel from your local laptop into the GCE instance connecting to
-GKE:
+Optional: If necessary, setup an SSH tunnel from your local laptop into the
+compute instance connecting to GKE:

 ```
 ssh $HOST -L 8080:localhost:8080
@ -240,4 +231,35 @@ To show the naive version, navigate to [localhost:8080](localhost:8080) from a b
 To show the ML version, navigate to
 [localhost:8080/kubeflow](localhost:8080/kubeflow) from a browser.

+## 6. Bring up a notebook
+
+Connect to the Central Dashboard by forwarding a port to one of the ambassador
+pods:
+
+```
+AMBASSADOR_POD=$(kubectl get po -l service=ambassador | \
+  grep ambassador | \
+  head -n 1 | \
+  cut -d " " -f 1 \
+)
+kubectl port-forward ${AMBASSADOR_POD} 8081:80
+```
+
+Open a browser and connect to [localhost:8081](localhost:8081).
+Show the TF-job dashboard, then click on Jupyterhub.
+Log in with any username and password combination and wait until the page
+refreshes. Spawn a new pod with these resource requirements:
+
+| Resource              | Value                                                                |
+| --------------------- | -------------------------------------------------------------------- |
+| Image                 | `gcr.io/kubeflow-images-public/tensorflow-1.7.0-notebook-gpu:v0.2.1` |
+| CPU                   | 2                                                                    |
+| Memory                | 48G                                                                  |
+| Extra Resource Limits | `{"nvidia.com/gpu":2}`                                               |
+
+Once the notebook environment is
+available, open a new terminal and upload this
+[Yelp notebook](notebooks/yelp.ipynb).
+
+Execute the notebook to show that it works.

--- a/demos/yelp_demo/demo_setup/README.md
+++ b/demos/yelp_demo/demo_setup/README.md
@ -340,20 +340,6 @@ minikube start \
  --kubernetes-version v1.10.6
 ```

-RBAC permissions allow your user to install kubeflow components on the cluster.
-
-```
-kubectl create clusterrolebinding cluster-admin-binding-${USER} \
-  --clusterrole cluster-admin \
-  --user $(gcloud config get-value account)
-./create_context.sh minikube ${NAMESPACE}
-```
-
-Create a namespace:
-```
-kubectl create namespace ${NAMESPACE}
-```
-
 ### Create k8s secrets

 Since our project is private, we need to provide access to resources via the use
--- a/demos/yelp_demo/demo_setup/kubeflow-demo-base.env
+++ b/demos/yelp_demo/demo_setup/kubeflow-demo-base.env
@ -1,12 +1,12 @@
 export DEMO_PROJECT=kubeflow-demo-base
-export NAMESPACE=kubeflow
+export NAMESPACE=default
 export ZONE=us-central1-a
 export CLUSTER=demo
 # Makefile uses project.
 export PROJECT=${DEMO_PROJECT}
 export ENV=gke
 export SVC_ACCT=minikube
-export VERSION=v0.2.5
+export KUBEFLOW_VERSION=v0.2.5
 export REPO_PATH=${HOME}/repos/kubeflow/examples/demos/yelp_demo

 export MAX_CASES=1000000
--- a/demos/yelp_demo/notebooks/README.md
+++ b/demos/yelp_demo/notebooks/README.md
@ -0,0 +1,39 @@
+## Yelp Sentiment Notebook Demo
+### yelp.ipynb
+
+We're trying to create a neural network that detects sentiment, and
+predicts whether a Yelp review (all 5 million of them) is positive or negative.
+Reviews come from words.  Maybe we can look at the set of words in a review and make a guess from that?
+ 
+Well let's see.  How frequently do some words occur?  Let's grab a sample of 100k positive and
+negative reviews, 200k total.
+
+![Lots of noise](noise.png)
+
+The chart on the left shows that most words from 200,000
+reviews are indistinguishable en masse,
+so that's why naive approaches may take a long time to converge.
+One approach is to look at words that occur more frequently.
+When we cut off words that occur less than 50 times, we get
+the chart on the right. We see the beginning of a distro, but it's weak.
+We have to clean up the signal for our network
+using a hypothesis of what's important.
+
+![Signal](signal.png)
+
+So we calculate a ratio, how many words appear
+in positive vs. negative reviews?  A value of 0 means
+all negative reviews, 1.0 means equal distribution between
+positive and negative, greater than 1.0 means more positive than
+negative. Neural networks like values between -1 and 1 centered at 0.
+Take the log(x)!  log(1) = 0,  log(1/x) = negative, log(n) = small.
+Then lets cut out those that don't help much, the area around 0.
+Voila.  A signal around 0, on the right!  Now let's train that on a simple network.
+
+![Results](semantics.png)
+
+The network converges in a few minutes with 94% accuracy.
+The weights on the input nodes are a vector representation of
+semantics, so vectors close together are similar.
+Here we see words that are somewhat similar, with some noise,
+as we project from a 64-dimensional space to 2 dimensions.  Neat!
--- a/demos/yelp_demo/notebooks/noise.png
+++ b/demos/yelp_demo/notebooks/noise.png
--- a/demos/yelp_demo/notebooks/semantics.png
+++ b/demos/yelp_demo/notebooks/semantics.png
--- a/demos/yelp_demo/notebooks/signal.png
+++ b/demos/yelp_demo/notebooks/signal.png
--- a/demos/yelp_demo/notebooks/yelp.ipynb
+++ b/demos/yelp_demo/notebooks/yelp.ipynb