Merge remote-tracking branch 'upstream/master' into third-party

2018-03-01 13:41:18 -05:00 · 2018-03-01 13:41:18 -05:00 · 8e3ddb2eec
parent 76862c5141 a6fee2bf18
commit 8e3ddb2eec
3 changed files with 109 additions and 7 deletions
--- a/github_issue_summarization/README.md
+++ b/github_issue_summarization/README.md
@ -1,13 +1,29 @@
-## Sequence-to-Sequence Tutorial with Github Issues Data
+# [WIP] End-to-End kubeflow tutorial using a Sequence-to-Sequence model
 Code For Medium Article: ["How To Create Data Products That Are Magical Using Sequence-to-Sequence Models"](https://medium.com/@hamelhusain/how-to-create-data-products-that-are-magical-using-sequence-to-sequence-models-703f86a231f8)
-## Resources:
+This example demonstrates how you can use `kubeflow` end-to-end to train and
 serve a Sequence-to-Sequence model on an existing kubernetes cluster. This
 tutorial is based upon @hamelsmu's article ["How To Create Data Products That
 Are Magical Using Sequence-to-Sequence
 Models"](https://medium.com/@hamelhusain/how-to-create-data-products-that-are-magical-using-sequence-to-sequence-models-703f86a231f8).
-1. [Tutorial Notebook](https://nbviewer.jupyter.org/github/hamelsmu/Seq2Seq_Tutorial/blob/master/notebooks/Tutorial.ipynb):  The Jupyter notebook that coincides with the Medium post.
+## Goals
-2. [seq2seq_utils.py](./notebooks/seq2seq_utils.py):  convenience functions that are used in the tutorial notebook to make predictions.
+There are two primary goals for this tutorial:
-3. [ktext](https://github.com/hamelsmu/ktext): this library is used in the tutorial to clean data.  This library can be installed with `pip`.  
+*   End-to-End kubeflow example
 *   End-to-End Sequence-to-Sequence model
-4. [Nvidia Docker Container](https://hub.docker.com/r/hamelsmu/seq2seq_tutorial/): contains all libraries that are required to run the tutorial.  This container is built with Nvidia-Docker v1.0.  You can run this container by executing `nvidia-docker run hamelsmu/seq2seq_tutorial/` after installing **Nvidia-Docker v1.0.** Note: I have not tested this on Nvidia-Docker v2.0.
+By the end of this tutorial, you should learn how to:
 *   Setup a Kubeflow cluster on an existing Kubernetes deployment
 *   Spawn up a Jupyter Notebook on the cluster
 *   Spawn up a shared-persistent storage across the cluster to store large
    datasets
 *   Train a Sequence-to-Sequence model using TensorFlow on the cluster using
    GPUs
 *   Serve the model using TensorFlow Serving
 ## Steps:
 1.  [Setup a Kubeflow cluster](setup_a_kubeflow_cluster.md)
 1.  [Teardown](teardown.md)
--- a/issue_summarization_github_issues/setup_a_kubeflow_cluster.md
+++ b/issue_summarization_github_issues/setup_a_kubeflow_cluster.md
@ -0,0 +1,66 @@
 # Setup Kubeflow
 In this part, you will setup kubeflow on an existing kubernetes cluster.
 ## Requirements
 *   A kubernetes cluster
 *   `kubectl` CLI pointing to the kubernetes cluster
    *   Make sure that you can run `kubectl get nodes` from your terminal
        successfully
 *   The ksonnet CLI: [ks](https://ksonnet.io/#get-started)
 Refer to the [user
 guide](https://github.com/kubeflow/kubeflow/blob/master/user_guide.md) for
 instructions on how to setup Kubeflow on your Kubernetes Cluster. Specifically
 complete the [Deploy
 Kubeflow](https://github.com/kubeflow/kubeflow/blob/master/user_guide.md#deploy-kubeflow)
 section and [Bringing up a
 Notebook](https://github.com/kubeflow/kubeflow/blob/master/user_guide.md#bringing-up-a-notebook)
 section.
 After completing that, you should have the following ready
 *   A ksonnet app in a directory named `my-kubeflow`
 *   An output similar to this for `kubectl get pods`
 ```
 NAME                              READY     STATUS    RESTARTS   AGE
 ambassador-7987df44b9-4pht8       2/2       Running   0          1m
 ambassador-7987df44b9-dh5h6       2/2       Running   0          1m
 ambassador-7987df44b9-qrgsm       2/2       Running   0          1m
 tf-hub-0                          1/1       Running   0          1m
 tf-job-operator-78757955b-qkg7s   1/1       Running   0          1m
 ```
 *   A Jupyter Notebook accessible at `http://127.0.0.1:8000`
 ## Provision storage for training data
 We need a shared persistent disk to store our training data since containers'
 filesystems are ephemeral and don't have a lot of storage space.
 The [Advanced
 Customization](https://github.com/kubeflow/kubeflow/blob/master/user_guide.md#advanced-customization)
 section of the [user
 guide](https://github.com/kubeflow/kubeflow/blob/master/user_guide.md) has
 instructions on how to provision a cluster-wide shared NFS.
 For this example, provision a `10GB` NFS mount with the name
 `github-issues-data`.
 After the NFS is ready, delete the `tf-hub-0` pod so that it gets recreated and
 picks up the NFS mount. You can delete it by running `kubectl delete pod
 tf-hub-0 -n={NAMESPACE}`
 At this point you should have a 10GB mount `/mnt/github-issues-data` in your
 Jupyter Notebook pod. Check this by running `!df` in your Jupyter Notebook.
 ## Summary
 *   We created a ksonnet app for our kubeflow deployment
 *   We created a disk for storing our training data
 *   We deployed the kubeflow-core component to our kubernetes cluster
 *   We connected to JupyterHub and spawned a new Jupyter notebook
 Next: [Training the model using our cluster](training_the_model.md)
--- a/issue_summarization_github_issues/teardown.md
+++ b/issue_summarization_github_issues/teardown.md
@ -0,0 +1,20 @@
 # Teardown
 Delete the kubernetes namespace
 ```
 kubectl delete namespace ${NAMESPACE}
 ```
 Delete the PD backing the NFS mount
 ```
 gcloud --project=${PROJECT} compute disks delete  --zone=${ZONE} ${PD_DISK_NAME}
 ```
 Delete the kubeflow-app directory
 ```
 rm -rf my-kubeflow
 ```