Edit navigation and markdown for github example (#93)

* edit TF example readme * prefix tutorial steps with a number for nicer display in repo * fix typo * edit steps 4 and 5 * edit docs * add navigation and formatting edits to example
2018-05-09 15:12:54 -04:00 · 2018-05-09 15:12:54 -04:00 · 0b303e70f1
parent 7434bb55ba
commit 0b303e70f1
10 changed files with 120 additions and 64 deletions
--- a/README.md
+++ b/README.md
@ -1,4 +1,18 @@
-## A repository to host extended examples and tutorials for kubeflow.
+# kubeflow-examples

-1. [Github issue summarization using sequence-to-sequence learning](./github_issue_summarization) by [Hamel Husain](https://github.com/hamelsmu)
-1. [MNIST example using S3 for Training, Serving, and Tensorboard monitoring. Automated using Argo and Kubeflow](./mnist) by [Elson Rodriguez](https://github.com/elsonrodriguez)
+A repository to share extended Kubeflow examples and tutorials to demonstrate
+Machine Learning concepts, data science workflow, and Kubeflow deployment.
+
+## Examples
+
+1. [GitHub issue summarization using sequence-to-sequence learning](./github_issue_summarization)
+   by [Hamel Husain](https://github.com/hamelsmu)
+   - Machine learning: sequence-to-sequence learning, Keras, Tensorflow
+   - Deployment concepts: end-to-end Kubeflow, Jupyter notebooks,
+     shared persistent storage, Tensorflow and GPUs, Seldon core deployment,
+     Flask front-end
+
+1. [MNIST example using S3 for Training, Serving, and Tensorboard monitoring. Automated using Argo and Kubeflow](./mnist)
+   by [Elson Rodriguez](https://github.com/elsonrodriguez)
+   - Machine learning concepts: MNIST
+   - Deployment concepts: Argo and Kubeflow automation, S3
--- a/github_issue_summarization/01_setup_a_kubeflow_cluster.md
+++ b/github_issue_summarization/01_setup_a_kubeflow_cluster.md
@ -5,7 +5,7 @@ In this part, you will setup kubeflow on an existing kubernetes cluster.
 ## Requirements

 *   A kubernetes cluster
-*   `kubectl` CLI pointing to the kubernetes cluster
+*   `kubectl` CLI (command line interface) pointing to the kubernetes cluster
    *   Make sure that you can run `kubectl get nodes` from your terminal
        successfully
 *   The ksonnet CLI, v0.9.2 or higher: [ks](https://ksonnet.io/#get-started)
@ -14,8 +14,9 @@ In this part, you will setup kubeflow on an existing kubernetes cluster.

 Refer to the [user
 guide](https://github.com/kubeflow/kubeflow/blob/master/user_guide.md) for
-instructions on how to setup kubeflow on your kubernetes cluster. Specifically,
-complete the following sections:
+detailed instructions on how to setup kubeflow on your kubernetes cluster.
+Specifically, complete the following sections:
+
 *    [Deploy
 Kubeflow](https://github.com/kubeflow/kubeflow/blob/master/user_guide.md#deploy-kubeflow)
    *   The `ks-kubeflow` directory can be used instead of creating a ksonnet
@ -43,9 +44,9 @@ Notebook](https://github.com/kubeflow/kubeflow/blob/master/user_guide.md#bringin
 After completing that, you should have the following ready:

 *   A ksonnet app in a directory named `ks-kubeflow`
-*   An output similar to this for `kubectl get pods`
+*   An output similar to this for `kubectl get pods` command

-```
+```commandline
 NAME                                   READY     STATUS              RESTARTS   AGE
 ambassador-75bb54594-dnxsd             2/2       Running             0          3m
 ambassador-75bb54594-hjj6m             2/2       Running             0          3m
@ -57,7 +58,7 @@ tf-job-dashboard-6c757d8684-d299l      1/1       Running             0
 tf-job-operator-77776c8446-lpprm       1/1       Running             0          3m
 ```

-*   A Jupyter Notebook accessible at `http://127.0.0.1:8000`
+*   A Jupyter Notebook accessible at http://127.0.0.1:8000
 *   A 10GB mount `/mnt/github-issues-data` in your Jupyter Notebook pod. Check this
    by running `!df` in your Jupyter Notebook.

@ -68,4 +69,4 @@ tf-job-operator-77776c8446-lpprm       1/1       Running             0
 *   We created a disk for storing our training data
 *   We connected to JupyterHub and spawned a new Jupyter notebook

-Next: [Training the model using our cluster](training_the_model.md)
+*Next*: [Training the model](02_training_the_model.md)
--- a/github_issue_summarization/02_tensor2tensor_training.md
+++ b/github_issue_summarization/02_tensor2tensor_training.md
@ -79,3 +79,11 @@ You can view the logs of the tf-job operator using
 ```commandline
 kubectl logs -f $(kubectl get pods -n=${NAMESPACE} -lname=tf-job-operator -o=jsonpath='{.items[0].metadata.name}')
 ```
+
+For information on:
+- [Training the model](02_training_the_model.md)
+- [Training the model using TFJob](02_training_model_tfjob.md)
+
+*Next*: [Serving the model](03_serving_the_model.md)
+
+*Back*: [Setup a kubeflow cluster](01_setup_a_kubeflow_cluster.md)
--- a/github_issue_summarization/02_training_the_model.md
+++ b/github_issue_summarization/02_training_the_model.md
@ -1,25 +1,34 @@
 # Training the model

-By this point, you should have a Jupyter Notebook running at `http://127.0.0.1:8000`.
+By this point, you should have a Jupyter Notebook running at http://127.0.0.1:8000.

 ## Download training files

-Open the Jupyter Notebook interface and create a new Terminal by clicking on New -> Terminal. In the Terminal, clone this git repo by executing: `git clone https://github.com/kubeflow/examples.git`.
+Open the Jupyter Notebook interface and create a new Terminal by clicking on
+menu, *New -> Terminal*. In the Terminal, clone this git repo by executing: `

-Now you should have all the code required to complete training in the `examples/github_issue_summarization/notebooks` folder. Navigate to this folder. Here you should see two files:
+```commandline
+git clone https://github.com/kubeflow/examples.git`
+```
+
+Now you should have all the code required to complete training in the `examples/github_issue_summarization/notebooks` folder. Navigate to this folder.
+Here you should see two files:

 *    `Training.ipynb`
 *    `seq2seq_utils.py`

 ## Perform training

-Open `Training.ipynb`. This contains a complete walk-through of downloading the training data, preprocessing it and training it.
+Open th `Training.ipynb` notebook. This contains a complete walk-through of
+downloading the training data, preprocessing it, and training it.

-Run the `Training.ipynb` notebook, viewing the output at each step to confirm that the resulting models produce sensible predictions.
+Run the `Training.ipynb` notebook, viewing the output at each step to confirm
+that the resulting models produce sensible predictions.

 ## Export trained model files

-After training completes, download the resulting files to your local machine. The following files are needed for serving:
+After training completes, download the resulting files to your local machine.
+The following files are needed for serving results:

 * `seq2seq_model_tutorial.h5` - the keras model
 * `body_pp.dpkl` - the serialized body preprocessor
@ -35,6 +44,10 @@ kubectl --namespace=${NAMESPACE} cp ${PODNAME}:/home/jovyan/examples/github_issu
 kubectl --namespace=${NAMESPACE} cp ${PODNAME}:/home/jovyan/examples/github_issue_summarization/notebooks/title_pp.dpkl .
 ```

+For information on:
+- [Training the model using TFJob](02_training_model_tfjob.md)
+- [Distributed training using tensor2tensor](02_tensor2tensor_training.md)

-Next: [Serving the model](serving_the_model.md)
+*Next*: [Serving the model](03_serving_the_model.md)

+*Back*: [Setup a kubeflow cluster](01_setup_a_kubeflow_cluster.md)
--- a/github_issue_summarization/02_training_the_model_tfjob.md
+++ b/github_issue_summarization/02_training_the_model_tfjob.md
@ -117,3 +117,11 @@ You can view the actual training logs using
 ```commandline
 kubectl logs -f $(kubectl get pods -n=${NAMESPACE} -ltf_job_name=tf-job-issue-summarization -o=jsonpath='{.items[0].metadata.name}')
 ```
+
+For information on:
+- [Training the model](02_training_the_model.md)
+- [Distributed training using tensor2tensor](02_tensor2tensor_training.md)
+
+*Next*: [Serving the model](03_serving_the_model.md)
+
+*Back*: [Setup a kubeflow cluster](01_setup_a_kubeflow_cluster.md)
--- a/github_issue_summarization/03_serving_the_model.md
+++ b/github_issue_summarization/03_serving_the_model.md
@ -118,4 +118,6 @@ Response
 }
 ```

-Next: [Querying the model](querying_the_model.md)
+*Next*: [Querying the model](04_querying_the_model.md)
+
+*Back*: [Training the model](02_training_the_model.md)
--- a/github_issue_summarization/04_querying_the_model.md
+++ b/github_issue_summarization/04_querying_the_model.md
@ -4,37 +4,39 @@ In this section, you will setup a barebones web server that displays the
 prediction provided by the previously deployed model.

 The following steps describe how to build a docker image and deploy it locally,
-where it accepts as input any arbitrary text and displays a
-machine-generated summary.
+where it accepts as input any arbitrary text and displays amachine-generated
+summary.


 ## Prerequisites

 Ensure that your model is live and listening for HTTP requests as described in
-[serving](serving_the_model.md).
+[serving](03_serving_the_model.md).


-## Build the frontend image
+## Build the front-end docker image

-To build the frontend image, issue the following commands:
+To build the front-end docker image, issue the following commands:

-```
+```commandline
 cd docker
 docker build -t gcr.io/gcr-repository-name/issue-summarization-ui:0.1 .
 ```

-## Store the frontend image
+## Store the front-end docker image

-To store the image in a location accessible to GKE, push it to the container
-registry of your choice. Here, it is pushed to Google Container Registry.
+To store the docker image in a location accessible to GKE, push it to the
+container registry of your choice. Here, it is pushed to Google Container
+Registry.

-```
+```commandline
 gcloud docker -- push gcr.io/gcr-repository-name/issue-summarization-ui:0.1
 ```

-## Deploy the frontend image to your kubernetes cluster
+## Deploy the front-end docker image to your kubernetes cluster

-The folder [ks-kubeflow](ks-kubeflow) contains a ksonnet app. The ui component in the ks-kubeflow app contains the frontend image deployment.
+The folder [ks-kubeflow](ks-kubeflow) contains a ksonnet app. The ui component
+in the `ks-kubeflow` app contains the frontend image deployment.

 To avoid rate-limiting by the GitHub API, you will need an [authentication token](https://github.com/ksonnet/ksonnet/blob/master/docs/troubleshooting.md) stored in the form of an environment variable `${GITHUB_TOKEN}`. The token does not require any permissions and is only used to prevent anonymous API calls.

@ -45,7 +47,7 @@ cd ks-kubeflow
 ks param set ui github_token ${GITHUB_TOKEN} --env ${KF_ENV}
 ```

-To serve the frontend interface, apply the ui component of the ksonnet app:
+To serve the frontend interface, apply the `ui` component of the ksonnet app:

 ```
 ks apply ${KF_ENV} -c ui
@ -53,15 +55,17 @@ ks apply ${KF_ENV} -c ui

 ## View results from the frontend

-We use ambassador to route requests to the frontend. You can port-forward the ambassador container locally:
+We use `ambassador` to route requests to the frontend. You can port-forward the
+ambassador container locally:

-```
+```commandline
 kubectl port-forward $(kubectl get pods -n ${NAMESPACE} -l service=ambassador -o jsonpath='{.items[0].metadata.name}') -n ${NAMESPACE} 8080:80
 ```

-In a browser, navigate to `http://localhost:8080/issue-summarization/`, where you will be greeted by "Issue
-text" Enter text into the input box and click submit. You should see a
-summary that was provided by your trained model.
+In a browser, navigate to `http://localhost:8080/issue-summarization/`, where
+you will be greeted by "Issuetext". Enter text into the input box and click
+"Submit". You should see a summary that was provided by your trained model.

+*Next*: [Teardown](05_teardown.md)

-Next: [Teardown](teardown.md)
+*Back*: [Serving the Model](03_serving_the_model.md)
--- a/github_issue_summarization/05_teardown.md
+++ b/github_issue_summarization/05_teardown.md
@ -0,0 +1,21 @@
+# Teardown
+
+Delete the kubernetes `namespace`.
+
+```commandline
+kubectl delete namespace ${NAMESPACE}
+```
+
+Delete the PD (persistent data) backing the NFS mount.
+
+```commandline
+gcloud --project=${PROJECT} compute disks delete  --zone=${ZONE} ${PD_DISK_NAME}
+```
+
+Delete the `kubeflow-app` directory.
+
+```commandline
+rm -rf my-kubeflow
+```
+
+*Back*: [Querying the model](04_querying_the_model.md)
--- a/github_issue_summarization/README.md
+++ b/github_issue_summarization/README.md
@ -10,8 +10,8 @@ Models"](https://medium.com/@hamelhusain/how-to-create-data-products-that-are-ma

 There are two primary goals for this tutorial:

-*   End-to-End kubeflow example
-*   End-to-End Sequence-to-Sequence model
+*   Demonstrate an End-to-End kubeflow example
+*   Present an End-to-End Sequence-to-Sequence model

 By the end of this tutorial, you should learn how to:

@ -19,18 +19,18 @@ By the end of this tutorial, you should learn how to:
 *   Spawn up a Jupyter Notebook on the cluster
 *   Spawn up a shared-persistent storage across the cluster to store large
    datasets
-*   Train a Sequence-to-Sequence model using TensorFlow on the cluster using
-    GPUs
+*   Train a Sequence-to-Sequence model using TensorFlow and GPUs on the cluster
 *   Serve the model using [Seldon Core](https://github.com/SeldonIO/seldon-core/)
 *   Query the model from a simple front-end application

 ## Steps:

-1.  [Setup a Kubeflow cluster](setup_a_kubeflow_cluster.md)
-1.  Training the model. You can train the model either using Jupyter Notebook or using TFJob.
-    1.  [Training the model using a Jupyter Notebook](training_the_model.md)
-    1.  [Training the model using TFJob](training_the_model_tfjob.md)
-    1.  [Distributed Training using tensor2tensor and TFJob](tensor2tensor_training.md)
-1.  [Serving the model](serving_the_model.md)
-1.  [Querying the model](querying_the_model.md)
-1.  [Teardown](teardown.md)
+1.  [Setup a Kubeflow cluster](01_setup_a_kubeflow_cluster.md)
+1.  Training the model. You can train the model using any of the following
+    methods using Jupyter Notebook or using TFJob:
+    -  [Training the model using a Jupyter Notebook](02_training_the_model.md)
+    -  [Training the model using TFJob](02_training_the_model_tfjob.md)
+    -  [Distributed Training using tensor2tensor and TFJob](02_tensor2tensor_training.md)
+1.  [Serving the model](03_serving_the_model.md)
+1.  [Querying the model](04_querying_the_model.md)
+1.  [Teardown](05_teardown.md)
--- a/github_issue_summarization/teardown.md
+++ b/github_issue_summarization/teardown.md
@ -1,15 +0,0 @@
-# Teardown
-
-Delete the kubernetes namespace
-
-```
-kubectl delete namespace ${NAMESPACE}
-```
-
-Delete the PD backing the NFS mount
-
-```
-gcloud --project=${PROJECT} compute disks delete  --zone=${ZONE} ${PD_DISK_NAME}
-
-```
-