mirror of https://github.com/kubeflow/examples.git
Mnist pipelines (#524)
* added mnist pipelines sample * fixed lint issues
This commit is contained in:
parent
7924e0fe21
commit
895e88bf67
|
@ -0,0 +1,2 @@
|
|||
venv
|
||||
*.tar.gz
|
|
@ -0,0 +1,187 @@
|
|||
# MNIST Pipelines GCP
|
||||
|
||||
This document describes how to run the [MNIST example](https://github.com/kubeflow/examples/tree/master/mnist) on Kubeflow Pipelines on a Google Cloud Platform cluster
|
||||
|
||||
## Setup
|
||||
|
||||
#### Create a GCS bucket
|
||||
|
||||
This pipeline requires a [Google Cloud Storage bucket](https://cloud.google.com/storage/) to hold your trained model. You can create one with the following command
|
||||
```
|
||||
BUCKET_NAME=kubeflow-pipeline-demo-$(date +%s)
|
||||
gsutil mb gs://$BUCKET_NAME/
|
||||
```
|
||||
|
||||
#### Deploy Kubeflow
|
||||
|
||||
Follow the [Getting Started Guide](https://www.kubeflow.org/docs/started/getting-started-gke) to deploy a Kubeflow cluster to GKE
|
||||
|
||||
#### Open the Kubeflow Pipelines UI
|
||||
|
||||

|
||||
|
||||
##### IAP enabled
|
||||
If you set up your cluster with IAP enabled as described in the [GKE Getting Started guide](https://www.kubeflow.org/docs/started/getting-started-gke),
|
||||
you can now access the Kubeflow Pipelines UI at `https://<deployment_name>.endpoints.<project>.cloud.goog/pipeline`
|
||||
|
||||
##### IAP disabled
|
||||
If you opted to skip IAP, you can open a connection to the UI using *kubectl port-forward* and browsing to http://localhost:8085/pipeline
|
||||
|
||||
```
|
||||
kubectl port-forward -n kubeflow $(kubectl get pods -n kubeflow --selector=service=ambassador \
|
||||
-o jsonpath='{.items[0].metadata.name}') 8085:80
|
||||
```
|
||||
|
||||
#### Install Python Dependencies
|
||||
|
||||
Set up a [virtual environment](https://docs.python.org/3/tutorial/venv.html) for your Kubeflow Pipelines work:
|
||||
|
||||
```
|
||||
python3 -m venv $(pwd)/venv
|
||||
source ./venv/bin/activate
|
||||
```
|
||||
|
||||
Install the Kubeflow Pipelines sdk, along with other Python dependencies in the [requirements.txt](./requirements.txt) file
|
||||
|
||||
```
|
||||
pip install -r requirements.txt --upgrade
|
||||
```
|
||||
|
||||
## Running the Pipeline
|
||||
|
||||
#### Compile Pipeline
|
||||
Pipelines are written in Python, but they must be compiled into a [domain-specific language (DSL)](https://en.wikipedia.org/wiki/Domain-specific_language)
|
||||
before they can be used. Most pipelines are designed so that simply running the script will preform the compilation step
|
||||
```
|
||||
python3 mnist-pipeline.py
|
||||
```
|
||||
Running this command should produce a compiled *mnist.tar.gz* file
|
||||
|
||||
Additionally, you can compile manually using the *dsl-compile* script
|
||||
|
||||
```
|
||||
python venv/bin/dsl-compile --py mnist-pipeline.py --output mnist-pipeline.py.tar.gz
|
||||
```
|
||||
|
||||
#### Upload through the UI
|
||||
|
||||
Now that you have the compiled pipelines file, you can upload it through the Kubeflow Pipelines UI.
|
||||
Simply select the "Upload pipeline" button
|
||||
|
||||

|
||||
|
||||
Upload your file and give it a name
|
||||
|
||||

|
||||
|
||||
#### Run the Pipeline
|
||||
|
||||
After clicking on the newly created pipeline, you should be presented with an overview of the pipeline graph.
|
||||
When you're ready, select the "Create Run" button to launch the pipeline
|
||||
|
||||

|
||||
|
||||
Fill out the information required for the run, including the GCP `$BUCKET_ID` you created earlier. Press "Start" when you are ready
|
||||
|
||||

|
||||
|
||||
After clicking on the newly created Run, you should see the pipeline run through the 'train', 'serve', and 'web-ui' components. Click on any component to see its logs.
|
||||
When the pipeline is complete, look at the logs for the web-ui component to find the IP address created for the MNIST web interface
|
||||
|
||||

|
||||
|
||||
## Pipeline Breakdown
|
||||
|
||||
Now that we've run a pipeline, lets break down how it works
|
||||
|
||||
#### Decorator
|
||||
```
|
||||
@dsl.pipeline(
|
||||
name='MNIST',
|
||||
description='A pipeline to train and serve the MNIST example.'
|
||||
)
|
||||
```
|
||||
Pipelines are expected to include a `@dsl.pipeline` decorator to provide metadata about the pipeline
|
||||
|
||||
#### Function Header
|
||||
```
|
||||
def mnist_pipeline(model_export_dir='gs://your-bucket/export',
|
||||
train_steps='200',
|
||||
learning_rate='0.01',
|
||||
batch_size='100'):
|
||||
```
|
||||
The pipeline is defined in the mnist_pipeline function. It includes a number of arguments, which are exposed in the Kubeflow Pipelines UI when creating a new Run.
|
||||
Although passed as strings, these arguments are of type [`kfp.dsl.PipelineParam`](https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/_pipeline_param.py)
|
||||
|
||||
#### Train
|
||||
```
|
||||
train = dsl.ContainerOp(
|
||||
name='train',
|
||||
image='gcr.io/kubeflow-examples/mnist/model:v20190304-v0.2-176-g15d997b',
|
||||
arguments=[
|
||||
"/opt/model.py",
|
||||
"--tf-export-dir", model_export_dir,
|
||||
"--tf-train-steps", train_steps,
|
||||
"--tf-batch-size", batch_size,
|
||||
"--tf-learning-rate", learning_rate
|
||||
]
|
||||
).apply(gcp.use_gcp_secret('user-gcp-sa'))
|
||||
```
|
||||
This block defines the 'train' component. A component is made up of a [`kfp.dsl.ContainerOp`](https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/_container_op.py)
|
||||
object with the container path and a name specified. The container image used is defined in the [Dockerfile.model in the MNIST example](https://github.com/kubeflow/examples/blob/master/mnist/Dockerfile.model)
|
||||
|
||||
Because the training component needs access to our GCS bucket, it is run with access to our 'user-gcp-sa' secret, which gives
|
||||
read/write access to GCS resources.
|
||||
After defining the train component, we also set a number of environment variables for the training script
|
||||
|
||||
#### Serve
|
||||
```
|
||||
serve = dsl.ContainerOp(
|
||||
name='serve',
|
||||
image='gcr.io/ml-pipeline/ml-pipeline-kubeflow-deployer:\
|
||||
7775692adf28d6f79098e76e839986c9ee55dd61',
|
||||
arguments=[
|
||||
'--model-export-path', model_export_dir,
|
||||
'--server-name', "mnist-service"
|
||||
]
|
||||
).apply(gcp.use_gcp_secret('user-gcp-sa'))
|
||||
```
|
||||
The 'serve' component is slightly different than 'train'. While 'train' runs a single container and then exits, 'serve' runs a container that launches long-living
|
||||
resources in the cluster. The ContainerOP takes two arguments: the path we exported our trained model to, and a server name. Using these, this pipeline component
|
||||
creates a Kubeflow [`tf-serving`](https://github.com/kubeflow/kubeflow/tree/master/kubeflow/tf-serving) service within the cluster. This service lives after the
|
||||
pipeline is complete, and can be seen using `kubectl get all -n kubeflow`. The Dockerfile used to build this container [can be found here](https://github.com/kubeflow/pipelines/blob/master/components/kubeflow/deployer/Dockerfile).
|
||||
Like the 'train' component, 'serve' requires access to the 'user-gcp-sa' secret for access to the 'kubectl' command within the container.
|
||||
|
||||
The `serve.after(train)` line specifies that this component is to run sequentially after 'train' is complete
|
||||
|
||||
#### Web UI
|
||||
```
|
||||
web_ui = dsl.ContainerOp(
|
||||
name='web-ui',
|
||||
image='gcr.io/kubeflow-examples/mnist/deploy-service:latest',
|
||||
arguments=[
|
||||
'--image', 'gcr.io/kubeflow-examples/mnist/web-ui:\
|
||||
v20190304-v0.2-176-g15d997b-pipelines',
|
||||
'--name', 'web-ui',
|
||||
'--container-port', '5000',
|
||||
'--service-port', '80',
|
||||
'--service-type', "LoadBalancer"
|
||||
]
|
||||
).apply(gcp.use_gcp_secret('user-gcp-sa'))
|
||||
|
||||
web_ui.after(serve)
|
||||
```
|
||||
Like 'serve', the web-ui component launches a service that exists after the pipeline is complete. Instead of launching a Kubeflow resource, the web-ui launches
|
||||
a standard Kubernetes Deployment/Service pair. The Dockerfile that builds the deployment image [can be found here.](./deploy-service/Dockerfile) This image is used
|
||||
to deploy the web UI, which was built from the [Dockerfile found in the MNIST example](https://github.com/kubeflow/examples/blob/master/mnist/web-ui/Dockerfile)
|
||||
|
||||
After this component is run, a new LoadBalancer is provisioned that gives external access to a 'web-ui' deployment launched in the cluster.
|
||||
|
||||
#### Main Function
|
||||
```
|
||||
if __name__ == '__main__':
|
||||
import kfp.compiler as compiler
|
||||
compiler.Compiler().compile(mnist_pipeline, __file__ + '.tar.gz')
|
||||
```
|
||||
|
||||
At the bottom of the script is a main function. This is used to compile the pipeline when the script is run
|
|
@ -0,0 +1,62 @@
|
|||
# Copyright 2018 The Kubeflow Authors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
FROM debian
|
||||
|
||||
RUN apt-get update -q && apt-get upgrade -y && \
|
||||
apt-get install -y -qq --no-install-recommends \
|
||||
apt-transport-https \
|
||||
ca-certificates \
|
||||
git \
|
||||
gnupg \
|
||||
lsb-release \
|
||||
unzip \
|
||||
wget && \
|
||||
wget -O /opt/ks_0.12.0_linux_amd64.tar.gz \
|
||||
https://github.com/ksonnet/ksonnet/releases/download/v0.12.0/ks_0.12.0_linux_amd64.tar.gz && \
|
||||
tar -C /opt -xzf /opt/ks_0.12.0_linux_amd64.tar.gz && \
|
||||
cp /opt/ks_0.12.0_linux_amd64/ks /bin/. && \
|
||||
rm -f /opt/ks_0.12.0_linux_amd64.tar.gz && \
|
||||
wget -O /bin/kubectl \
|
||||
https://storage.googleapis.com/kubernetes-release/release/v1.11.2/bin/linux/amd64/kubectl && \
|
||||
chmod u+x /bin/kubectl && \
|
||||
wget -O /opt/kubernetes_v1.11.2 \
|
||||
https://github.com/kubernetes/kubernetes/archive/v1.11.2.tar.gz && \
|
||||
mkdir -p /src && \
|
||||
tar -C /src -xzf /opt/kubernetes_v1.11.2 && \
|
||||
rm -rf /opt/kubernetes_v1.11.2 && \
|
||||
wget -O /opt/google-apt-key.gpg \
|
||||
https://packages.cloud.google.com/apt/doc/apt-key.gpg && \
|
||||
apt-key add /opt/google-apt-key.gpg && \
|
||||
export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
|
||||
echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" >> \
|
||||
/etc/apt/sources.list.d/google-cloud-sdk.list && \
|
||||
apt-get update -q && \
|
||||
apt-get install -y -qq --no-install-recommends google-cloud-sdk && \
|
||||
gcloud config set component_manager/disable_update_check true
|
||||
|
||||
ENV KUBEFLOW_VERSION v0.2.5
|
||||
|
||||
# Checkout the kubeflow packages at image build time so that we do not
|
||||
# require calling in to the GitHub API at run time.
|
||||
RUN cd /src && \
|
||||
mkdir -p github.com/kubeflow && \
|
||||
cd github.com/kubeflow && \
|
||||
git clone https://github.com/kubeflow/kubeflow && \
|
||||
cd kubeflow && \
|
||||
git checkout ${KUBEFLOW_VERSION}
|
||||
|
||||
ADD ./src/deploy.sh /bin/.
|
||||
|
||||
ENTRYPOINT ["/bin/deploy.sh"]
|
|
@ -0,0 +1,127 @@
|
|||
#!/bin/bash -e
|
||||
|
||||
# Copyright 2018 The Kubeflow Authors
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
set -x
|
||||
|
||||
KUBERNETES_NAMESPACE="${KUBERNETES_NAMESPACE:-kubeflow}"
|
||||
NAME="my-deployment"
|
||||
|
||||
while (($#)); do
|
||||
case $1 in
|
||||
"--image")
|
||||
shift
|
||||
IMAGE_PATH="$1"
|
||||
shift
|
||||
;;
|
||||
"--service-type")
|
||||
shift
|
||||
SERVICE_TYPE="$1"
|
||||
shift
|
||||
;;
|
||||
"--container-port")
|
||||
shift
|
||||
CONTAINER_PORT="--containerPort=$1"
|
||||
shift
|
||||
;;
|
||||
"--service-port")
|
||||
shift
|
||||
SERVICE_PORT="--servicePort=$1"
|
||||
shift
|
||||
;;
|
||||
"--cluster-name")
|
||||
shift
|
||||
CLUSTER_NAME="$1"
|
||||
shift
|
||||
;;
|
||||
"--namespace")
|
||||
shift
|
||||
KUBERNETES_NAMESPACE="$1"
|
||||
shift
|
||||
;;
|
||||
"--name")
|
||||
shift
|
||||
NAME="$1"
|
||||
shift
|
||||
;;
|
||||
*)
|
||||
echo "Unknown argument: '$1'"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [ -z "${IMAGE_PATH}" ]; then
|
||||
echo "You must specify an image to deploy"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ -z "$SERVICE_TYPE" ]; then
|
||||
SERVICE_TYPE=ClusterIP
|
||||
fi
|
||||
|
||||
echo "Deploying the image '${IMAGE_PATH}'"
|
||||
|
||||
if [ -z "${CLUSTER_NAME}" ]; then
|
||||
CLUSTER_NAME=$(wget -q -O- --header="Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/instance/attributes/cluster-name)
|
||||
fi
|
||||
|
||||
# Ensure the name is not more than 63 characters.
|
||||
NAME="${NAME:0:63}"
|
||||
# Trim any trailing hyphens from the server name.
|
||||
while [[ "${NAME:(-1)}" == "-" ]]; do NAME="${NAME::-1}"; done
|
||||
|
||||
echo "Deploying ${NAME} to the cluster ${CLUSTER_NAME}"
|
||||
|
||||
# Connect kubectl to the local cluster
|
||||
kubectl config set-cluster "${CLUSTER_NAME}" --server=https://kubernetes.default --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
|
||||
kubectl config set-credentials pipeline --token "$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"
|
||||
kubectl config set-context kubeflow --cluster "${CLUSTER_NAME}" --user pipeline
|
||||
kubectl config use-context kubeflow
|
||||
|
||||
# Configure and deploy the app
|
||||
cd /src/github.com/kubeflow/kubeflow
|
||||
git checkout ${KUBEFLOW_VERSION}
|
||||
|
||||
cd /opt
|
||||
echo "Initializing KSonnet app..."
|
||||
ks init tf-serving-app
|
||||
cd tf-serving-app/
|
||||
|
||||
if [ -n "${KUBERNETES_NAMESPACE}" ]; then
|
||||
echo "Setting Kubernetes namespace: ${KUBERNETES_NAMESPACE} ..."
|
||||
ks env set default --namespace "${KUBERNETES_NAMESPACE}"
|
||||
fi
|
||||
|
||||
ks generate deployed-service $NAME --name=$NAME --image=$IMAGE_PATH --type=$SERVICE_TYPE $CONTAINER_PORT $SERVICE_PORT
|
||||
|
||||
echo "Deploying the service..."
|
||||
ks apply default -c $NAME
|
||||
|
||||
# Wait for the ip address
|
||||
timeout="1000"
|
||||
start_time=`date +%s`
|
||||
PUBLIC_IP=""
|
||||
while [ -z "$PUBLIC_IP" ]; do
|
||||
PUBLIC_IP=$(kubectl get svc -n $KUBERNETES_NAMESPACE $NAME -o jsonpath='{.status.loadBalancer.ingress[0].ip}' 2> /dev/null)
|
||||
current_time=`date +%s`
|
||||
elapsed_time=$(expr $current_time + 1 - $start_time)
|
||||
if [[ $elapsed_time -gt $timeout ]];then
|
||||
echo "timeout"
|
||||
exit 1
|
||||
fi
|
||||
sleep 5
|
||||
done
|
||||
echo "service active: $PUBLIC_IP"
|
Binary file not shown.
After Width: | Height: | Size: 133 KiB |
Binary file not shown.
After Width: | Height: | Size: 199 KiB |
Binary file not shown.
After Width: | Height: | Size: 46 KiB |
Binary file not shown.
After Width: | Height: | Size: 79 KiB |
Binary file not shown.
After Width: | Height: | Size: 3.4 KiB |
Binary file not shown.
After Width: | Height: | Size: 34 KiB |
|
@ -0,0 +1,80 @@
|
|||
# Copyright 2019 Google LLC
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
"""
|
||||
Kubeflow Pipelines MNIST example
|
||||
|
||||
Run this script to compile pipeline
|
||||
"""
|
||||
|
||||
|
||||
import kfp.dsl as dsl
|
||||
import kfp.gcp as gcp
|
||||
|
||||
|
||||
@dsl.pipeline(
|
||||
name='MNIST',
|
||||
description='A pipeline to train and serve the MNIST example.'
|
||||
)
|
||||
def mnist_pipeline(model_export_dir='gs://your-bucket/export',
|
||||
train_steps='200',
|
||||
learning_rate='0.01',
|
||||
batch_size='100'):
|
||||
"""
|
||||
Pipeline with three stages:
|
||||
1. train an MNIST classifier
|
||||
2. deploy a tf-serving instance to the cluster
|
||||
3. deploy a web-ui to interact with it
|
||||
"""
|
||||
train = dsl.ContainerOp(
|
||||
name='train',
|
||||
image='gcr.io/kubeflow-examples/mnist/model:v20190304-v0.2-176-g15d997b',
|
||||
arguments=[
|
||||
"/opt/model.py",
|
||||
"--tf-export-dir", model_export_dir,
|
||||
"--tf-train-steps", train_steps,
|
||||
"--tf-batch-size", batch_size,
|
||||
"--tf-learning-rate", learning_rate
|
||||
]
|
||||
).apply(gcp.use_gcp_secret('user-gcp-sa'))
|
||||
|
||||
serve = dsl.ContainerOp(
|
||||
name='serve',
|
||||
image='gcr.io/ml-pipeline/ml-pipeline-kubeflow-deployer:\
|
||||
7775692adf28d6f79098e76e839986c9ee55dd61',
|
||||
arguments=[
|
||||
'--model-export-path', model_export_dir,
|
||||
'--server-name', "mnist-service"
|
||||
]
|
||||
).apply(gcp.use_gcp_secret('user-gcp-sa'))
|
||||
serve.after(train)
|
||||
|
||||
web_ui = dsl.ContainerOp(
|
||||
name='web-ui',
|
||||
image='gcr.io/kubeflow-examples/mnist/deploy-service:latest',
|
||||
arguments=[
|
||||
'--image', 'gcr.io/kubeflow-examples/mnist/web-ui:\
|
||||
v20190304-v0.2-176-g15d997b-pipelines',
|
||||
'--name', 'web-ui',
|
||||
'--container-port', '5000',
|
||||
'--service-port', '80',
|
||||
'--service-type', "LoadBalancer"
|
||||
]
|
||||
).apply(gcp.use_gcp_secret('user-gcp-sa'))
|
||||
|
||||
web_ui.after(serve)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
import kfp.compiler as compiler
|
||||
compiler.Compiler().compile(mnist_pipeline, __file__ + '.tar.gz')
|
|
@ -0,0 +1,3 @@
|
|||
python-dateutil
|
||||
https://storage.googleapis.com/ml-pipeline/release/0.1.9/kfp.tar.gz
|
||||
kubernetes==8.0.0
|
Loading…
Reference in New Issue