Mnist pipelines (#524)

* added mnist pipelines sample * fixed lint issues
2019-03-16 14:14:55 -07:00 · 2019-03-16 14:14:55 -07:00 · 895e88bf67
parent 7924e0fe21
commit 895e88bf67
12 changed files with 461 additions and 0 deletions
--- a/pipelines/mnist-pipelines/.gitignore
+++ b/pipelines/mnist-pipelines/.gitignore
@ -0,0 +1,2 @@
+venv
+*.tar.gz
--- a/pipelines/mnist-pipelines/README.md
+++ b/pipelines/mnist-pipelines/README.md
@ -0,0 +1,187 @@
+# MNIST Pipelines GCP
+
+This document describes how to run the [MNIST example](https://github.com/kubeflow/examples/tree/master/mnist) on Kubeflow Pipelines on a Google Cloud Platform cluster
+
+## Setup
+
+#### Create a GCS bucket
+
+This pipeline requires a [Google Cloud Storage bucket](https://cloud.google.com/storage/) to hold your trained model. You can create one with the following command
+```
+BUCKET_NAME=kubeflow-pipeline-demo-$(date +%s)
+gsutil mb gs://$BUCKET_NAME/
+```
+
+#### Deploy Kubeflow
+
+Follow the [Getting Started Guide](https://www.kubeflow.org/docs/started/getting-started-gke) to deploy a Kubeflow cluster to GKE
+
+#### Open the Kubeflow Pipelines UI
+
+![Kubeflow UI](./img/kubeflow.png "Kubeflow UI")
+
+##### IAP enabled
+If you set up your cluster with IAP enabled as described in the [GKE Getting Started guide](https://www.kubeflow.org/docs/started/getting-started-gke), 
+you can now access the Kubeflow Pipelines UI at `https://<deployment_name>.endpoints.<project>.cloud.goog/pipeline`
+
+##### IAP disabled
+If you opted to skip IAP, you can open a connection to the UI using *kubectl port-forward* and browsing to http://localhost:8085/pipeline
+
+```
+kubectl port-forward -n kubeflow $(kubectl get pods -n kubeflow --selector=service=ambassador \
+    -o jsonpath='{.items[0].metadata.name}') 8085:80
+```
+
+#### Install Python Dependencies
+
+Set up a [virtual environment](https://docs.python.org/3/tutorial/venv.html) for your Kubeflow Pipelines work:
+
+```
+python3 -m venv $(pwd)/venv
+source ./venv/bin/activate
+```
+
+Install the Kubeflow Pipelines sdk, along with other Python dependencies in the [requirements.txt](./requirements.txt) file
+
+```
+pip install -r requirements.txt --upgrade
+```
+
+## Running the Pipeline
+
+#### Compile Pipeline
+Pipelines are written in Python, but they must be compiled into a [domain-specific language (DSL)](https://en.wikipedia.org/wiki/Domain-specific_language)
+before they can be used. Most pipelines are designed so that simply running the script will preform the compilation step
+```
+python3 mnist-pipeline.py
+```
+Running this command should produce a compiled *mnist.tar.gz* file
+
+Additionally, you can compile manually using the *dsl-compile* script
+
+```
+python venv/bin/dsl-compile --py mnist-pipeline.py --output mnist-pipeline.py.tar.gz
+```
+
+#### Upload through the UI
+
+Now that you have the compiled pipelines file, you can upload it through the Kubeflow Pipelines UI.
+Simply select the "Upload pipeline" button
+
+![Upload Button](./img/upload_btn.png "Upload Button")
+
+Upload your file and give it a name
+
+![Upload Form](./img/upload_form.png "Upload Form")
+
+#### Run the Pipeline
+
+After clicking on the newly created pipeline, you should be presented with an overview of the pipeline graph.
+When you're ready, select the "Create Run" button to launch the pipeline
+
+![Pipeline](./img/pipeline.png "Pipeline")
+
+Fill out the information required for the run, including the GCP `$BUCKET_ID` you created earlier. Press "Start" when you are ready
+
+![Run Form](./img/run_form.png "Run Form")
+
+After clicking on the newly created Run, you should see the pipeline run through the 'train', 'serve', and 'web-ui' components. Click on any component to see its logs.
+When the pipeline is complete, look at the logs for the web-ui component to find the IP address created for the MNIST web interface
+
+![Logs](./img/logs.png "Logs")
+
+## Pipeline Breakdown
+
+Now that we've run a pipeline, lets break down how it works
+
+#### Decorator
+```
+@dsl.pipeline(
+    name='MNIST',
+    description='A pipeline to train and serve the MNIST example.'
+)
+```
+Pipelines are expected to include a `@dsl.pipeline` decorator to provide metadata about the pipeline
+
+#### Function Header
+```
+def mnist_pipeline(model_export_dir='gs://your-bucket/export',
+                   train_steps='200',
+                   learning_rate='0.01',
+                   batch_size='100'):
+```
+The pipeline is defined in the mnist_pipeline function. It includes a number of arguments, which are exposed in the Kubeflow Pipelines UI when creating a new Run. 
+Although passed as strings, these arguments are of type [`kfp.dsl.PipelineParam`](https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/_pipeline_param.py)
+
+#### Train
+```
+train = dsl.ContainerOp(
+    name='train',
+    image='gcr.io/kubeflow-examples/mnist/model:v20190304-v0.2-176-g15d997b',
+    arguments=[
+        "/opt/model.py",
+        "--tf-export-dir", model_export_dir,
+        "--tf-train-steps", train_steps,
+        "--tf-batch-size", batch_size,
+        "--tf-learning-rate", learning_rate
+        ]
+).apply(gcp.use_gcp_secret('user-gcp-sa'))
+```
+This block defines the 'train' component. A component is made up of a [`kfp.dsl.ContainerOp`](https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/_container_op.py) 
+object with the container path and a name specified. The container image used is defined in the [Dockerfile.model in the MNIST example](https://github.com/kubeflow/examples/blob/master/mnist/Dockerfile.model)
+
+Because the training component needs access to our GCS bucket, it is run with access to our 'user-gcp-sa' secret, which gives 
+read/write access to GCS resources.
+After defining the train component, we also set a number of environment variables for the training script
+
+#### Serve
+```
+serve = dsl.ContainerOp(
+    name='serve',
+    image='gcr.io/ml-pipeline/ml-pipeline-kubeflow-deployer:\
+            7775692adf28d6f79098e76e839986c9ee55dd61',
+    arguments=[
+        '--model-export-path', model_export_dir,
+        '--server-name', "mnist-service"
+    ]
+).apply(gcp.use_gcp_secret('user-gcp-sa'))
+```
+The 'serve' component is slightly different than 'train'. While 'train' runs a single container and then exits, 'serve' runs a container that launches long-living 
+resources in the cluster. The ContainerOP takes two arguments: the path we exported our trained model to, and a server name. Using these, this pipeline component 
+creates a Kubeflow [`tf-serving`](https://github.com/kubeflow/kubeflow/tree/master/kubeflow/tf-serving) service within the cluster. This service lives after the 
+pipeline is complete, and can be seen using `kubectl get all -n kubeflow`. The Dockerfile used to build this container [can be found here](https://github.com/kubeflow/pipelines/blob/master/components/kubeflow/deployer/Dockerfile).
+Like the 'train' component, 'serve' requires access to the 'user-gcp-sa' secret for access to the 'kubectl' command within the container.
+
+The `serve.after(train)` line specifies that this component is to run sequentially after 'train' is complete
+
+#### Web UI
+```
+web_ui = dsl.ContainerOp(
+    name='web-ui',
+    image='gcr.io/kubeflow-examples/mnist/deploy-service:latest',
+    arguments=[
+        '--image', 'gcr.io/kubeflow-examples/mnist/web-ui:\
+                v20190304-v0.2-176-g15d997b-pipelines',
+        '--name', 'web-ui',
+        '--container-port', '5000',
+        '--service-port', '80',
+        '--service-type', "LoadBalancer"
+    ]
+).apply(gcp.use_gcp_secret('user-gcp-sa'))
+
+web_ui.after(serve)
+```
+Like 'serve', the web-ui component launches a service that exists after the pipeline is complete. Instead of launching a Kubeflow resource, the web-ui launches
+a standard Kubernetes Deployment/Service pair. The Dockerfile that builds the deployment image [can be found here.](./deploy-service/Dockerfile) This image is used
+to deploy the web UI, which was built from the [Dockerfile found in the MNIST example](https://github.com/kubeflow/examples/blob/master/mnist/web-ui/Dockerfile)
+
+After this component is run, a new LoadBalancer is provisioned that gives external access to a 'web-ui' deployment launched in the cluster.
+
+#### Main Function
+```
+if __name__ == '__main__':
+    import kfp.compiler as compiler
+    compiler.Compiler().compile(mnist_pipeline, __file__ + '.tar.gz')
+```
+
+At the bottom of the script is a main function. This is used to compile the pipeline when the script is run
--- a/pipelines/mnist-pipelines/deploy-service/Dockerfile
+++ b/pipelines/mnist-pipelines/deploy-service/Dockerfile
@ -0,0 +1,62 @@
+# Copyright 2018 The Kubeflow Authors
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+FROM debian
+
+RUN apt-get update -q && apt-get upgrade -y && \
+    apt-get install -y -qq --no-install-recommends \
+      apt-transport-https \
+      ca-certificates \
+      git \
+      gnupg \
+      lsb-release \
+      unzip \
+      wget && \
+    wget -O /opt/ks_0.12.0_linux_amd64.tar.gz \
+      https://github.com/ksonnet/ksonnet/releases/download/v0.12.0/ks_0.12.0_linux_amd64.tar.gz && \
+    tar -C /opt -xzf /opt/ks_0.12.0_linux_amd64.tar.gz && \
+    cp /opt/ks_0.12.0_linux_amd64/ks /bin/. && \
+    rm -f /opt/ks_0.12.0_linux_amd64.tar.gz && \
+    wget -O /bin/kubectl \
+      https://storage.googleapis.com/kubernetes-release/release/v1.11.2/bin/linux/amd64/kubectl && \
+    chmod u+x /bin/kubectl && \
+    wget -O /opt/kubernetes_v1.11.2 \
+      https://github.com/kubernetes/kubernetes/archive/v1.11.2.tar.gz && \
+    mkdir -p /src && \
+    tar -C /src -xzf /opt/kubernetes_v1.11.2 && \
+    rm -rf /opt/kubernetes_v1.11.2 && \
+    wget -O /opt/google-apt-key.gpg \
+      https://packages.cloud.google.com/apt/doc/apt-key.gpg && \
+    apt-key add /opt/google-apt-key.gpg && \
+    export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
+    echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" >> \
+      /etc/apt/sources.list.d/google-cloud-sdk.list && \
+    apt-get update -q && \
+    apt-get install -y -qq --no-install-recommends google-cloud-sdk && \
+    gcloud config set component_manager/disable_update_check true
+
+ENV KUBEFLOW_VERSION v0.2.5
+
+# Checkout the kubeflow packages at image build time so that we do not
+# require calling in to the GitHub API at run time.
+RUN cd /src && \
+    mkdir -p github.com/kubeflow && \
+    cd github.com/kubeflow && \
+    git clone https://github.com/kubeflow/kubeflow && \
+    cd kubeflow && \
+    git checkout ${KUBEFLOW_VERSION}
+
+ADD ./src/deploy.sh /bin/.
+
+ENTRYPOINT ["/bin/deploy.sh"]
--- a/pipelines/mnist-pipelines/deploy-service/src/deploy.sh
+++ b/pipelines/mnist-pipelines/deploy-service/src/deploy.sh
@ -0,0 +1,127 @@
+#!/bin/bash -e
+
+# Copyright 2018 The Kubeflow Authors
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+set -x
+
+KUBERNETES_NAMESPACE="${KUBERNETES_NAMESPACE:-kubeflow}"
+NAME="my-deployment"
+
+while (($#)); do
+   case $1 in
+     "--image")
+       shift
+       IMAGE_PATH="$1"
+       shift
+       ;;
+     "--service-type")
+       shift
+       SERVICE_TYPE="$1"
+       shift
+       ;;
+     "--container-port")
+       shift
+       CONTAINER_PORT="--containerPort=$1"
+       shift
+       ;;
+     "--service-port")
+       shift
+       SERVICE_PORT="--servicePort=$1"
+       shift
+       ;;
+     "--cluster-name")
+       shift
+       CLUSTER_NAME="$1"
+       shift
+       ;;
+     "--namespace")
+       shift
+       KUBERNETES_NAMESPACE="$1"
+       shift
+       ;;
+     "--name")
+       shift
+       NAME="$1"
+       shift
+       ;;
+     *)
+       echo "Unknown argument: '$1'"
+       exit 1
+       ;;
+   esac
+done
+
+if [ -z "${IMAGE_PATH}" ]; then
+  echo "You must specify an image to deploy"
+  exit 1
+fi
+
+if [ -z "$SERVICE_TYPE" ]; then
+    SERVICE_TYPE=ClusterIP
+fi
+
+echo "Deploying the image '${IMAGE_PATH}'"
+
+if [ -z "${CLUSTER_NAME}" ]; then
+  CLUSTER_NAME=$(wget -q -O- --header="Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/attributes/cluster-name)
+fi
+
+# Ensure the name is not more than 63 characters.
+NAME="${NAME:0:63}"
+# Trim any trailing hyphens from the server name.
+while [[ "${NAME:(-1)}" == "-" ]]; do NAME="${NAME::-1}"; done
+
+echo "Deploying ${NAME} to the cluster ${CLUSTER_NAME}"
+
+# Connect kubectl to the local cluster
+kubectl config set-cluster "${CLUSTER_NAME}" --server=https://kubernetes.default --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
+kubectl config set-credentials pipeline --token "$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"
+kubectl config set-context kubeflow --cluster "${CLUSTER_NAME}" --user pipeline
+kubectl config use-context kubeflow
+
+# Configure and deploy the app
+cd /src/github.com/kubeflow/kubeflow
+git checkout ${KUBEFLOW_VERSION}
+
+cd /opt
+echo "Initializing KSonnet app..."
+ks init tf-serving-app
+cd tf-serving-app/
+
+if [ -n "${KUBERNETES_NAMESPACE}" ]; then
+  echo "Setting Kubernetes namespace: ${KUBERNETES_NAMESPACE} ..."
+  ks env set default --namespace "${KUBERNETES_NAMESPACE}"
+fi
+
+ks generate deployed-service $NAME --name=$NAME --image=$IMAGE_PATH --type=$SERVICE_TYPE $CONTAINER_PORT $SERVICE_PORT
+
+echo "Deploying the service..."
+ks apply default -c $NAME
+
+# Wait for the ip address
+timeout="1000"
+start_time=`date +%s`
+PUBLIC_IP=""
+while [ -z "$PUBLIC_IP" ]; do
+  PUBLIC_IP=$(kubectl get svc -n $KUBERNETES_NAMESPACE $NAME -o jsonpath='{.status.loadBalancer.ingress[0].ip}' 2> /dev/null)
+  current_time=`date +%s`
+  elapsed_time=$(expr $current_time + 1 - $start_time)
+  if [[ $elapsed_time -gt $timeout ]];then
+    echo "timeout"
+    exit 1
+  fi
+  sleep 5
+done
+echo "service active: $PUBLIC_IP"
--- a/pipelines/mnist-pipelines/img/kubeflow.png
+++ b/pipelines/mnist-pipelines/img/kubeflow.png
--- a/pipelines/mnist-pipelines/img/logs.png
+++ b/pipelines/mnist-pipelines/img/logs.png
--- a/pipelines/mnist-pipelines/img/pipeline.png
+++ b/pipelines/mnist-pipelines/img/pipeline.png
--- a/pipelines/mnist-pipelines/img/run_form.png
+++ b/pipelines/mnist-pipelines/img/run_form.png
--- a/pipelines/mnist-pipelines/img/upload_btn.png
+++ b/pipelines/mnist-pipelines/img/upload_btn.png
--- a/pipelines/mnist-pipelines/img/upload_form.png
+++ b/pipelines/mnist-pipelines/img/upload_form.png
--- a/pipelines/mnist-pipelines/mnist_pipeline.py
+++ b/pipelines/mnist-pipelines/mnist_pipeline.py
@ -0,0 +1,80 @@
+# Copyright 2019 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Kubeflow Pipelines MNIST example
+
+Run this script to compile pipeline
+"""
+
+
+import kfp.dsl as dsl
+import kfp.gcp as gcp
+
+
+@dsl.pipeline(
+  name='MNIST',
+  description='A pipeline to train and serve the MNIST example.'
+)
+def mnist_pipeline(model_export_dir='gs://your-bucket/export',
+                   train_steps='200',
+                   learning_rate='0.01',
+                   batch_size='100'):
+  """
+  Pipeline with three stages:
+    1. train an MNIST classifier
+    2. deploy a tf-serving instance to the cluster
+    3. deploy a web-ui to interact with it
+  """
+  train = dsl.ContainerOp(
+      name='train',
+      image='gcr.io/kubeflow-examples/mnist/model:v20190304-v0.2-176-g15d997b',
+      arguments=[
+          "/opt/model.py",
+          "--tf-export-dir", model_export_dir,
+          "--tf-train-steps", train_steps,
+          "--tf-batch-size", batch_size,
+          "--tf-learning-rate", learning_rate
+          ]
+  ).apply(gcp.use_gcp_secret('user-gcp-sa'))
+
+  serve = dsl.ContainerOp(
+      name='serve',
+      image='gcr.io/ml-pipeline/ml-pipeline-kubeflow-deployer:\
+              7775692adf28d6f79098e76e839986c9ee55dd61',
+      arguments=[
+          '--model-export-path', model_export_dir,
+          '--server-name', "mnist-service"
+      ]
+  ).apply(gcp.use_gcp_secret('user-gcp-sa'))
+  serve.after(train)
+
+  web_ui = dsl.ContainerOp(
+      name='web-ui',
+      image='gcr.io/kubeflow-examples/mnist/deploy-service:latest',
+      arguments=[
+          '--image', 'gcr.io/kubeflow-examples/mnist/web-ui:\
+                  v20190304-v0.2-176-g15d997b-pipelines',
+          '--name', 'web-ui',
+          '--container-port', '5000',
+          '--service-port', '80',
+          '--service-type', "LoadBalancer"
+      ]
+  ).apply(gcp.use_gcp_secret('user-gcp-sa'))
+
+  web_ui.after(serve)
+
+
+if __name__ == '__main__':
+  import kfp.compiler as compiler
+  compiler.Compiler().compile(mnist_pipeline, __file__ + '.tar.gz')
--- a/pipelines/mnist-pipelines/requirements.txt
+++ b/pipelines/mnist-pipelines/requirements.txt
@ -0,0 +1,3 @@
+python-dateutil
+https://storage.googleapis.com/ml-pipeline/release/0.1.9/kfp.tar.gz
+kubernetes==8.0.0