mirror of https://github.com/kubeflow/examples.git
Use latest kubeflow release branch v0.3.4-rc.1 (#365)
Remove separate pipelines installation Update kfp version to 0.1.3-rc.2 Clarify difference in installation paths (click-to-deploy vs CLI) Use set_gpu_limit() and remove generated yaml with resource limits
This commit is contained in:
parent
31390d39a0
commit
6fcb28bc26
|
|
@ -6,17 +6,13 @@ presentation to public audiences.
|
|||
The base demo includes the following steps:
|
||||
|
||||
1. [Setup your environment](#1-setup-your-environment)
|
||||
1. [Create a GKE cluster and install Kubeflow](#2-create-a-gke-cluster-and-install-kubeflow)
|
||||
1. [Install pipelines on GKE](#3-install-pipelines-on-gke)
|
||||
1. [Create a GKE cluster and install Kubeflow with pipelines](#2-create-a-gke-cluster-and-install-kubeflow-with-pipelines)
|
||||
|
||||
## 1. Setup your environment
|
||||
|
||||
Clone the [kubeflow/kubeflow](https://github.com/kubeflow/kubeflow) repo and
|
||||
checkout the
|
||||
[`v0.3.3`](https://github.com/kubeflow/kubeflow/releases/tag/v0.3.3) branch.
|
||||
Clone the [kubeflow/pipelines](https://github.com/kubeflow/pipelines) repo and
|
||||
checkout the
|
||||
[`0.1.2`](https://github.com/kubeflow/pipelines/releases/tag/0.1.2) branch.
|
||||
[`v0.3.4-rc.1`](https://github.com/kubeflow/kubeflow/releases/tag/v0.3.4-rc.1) branch.
|
||||
|
||||
Ensure that the repo paths, project name, and other variables are set correctly.
|
||||
When all overrides are set, source the environment file:
|
||||
|
|
@ -35,16 +31,28 @@ source activate kfp
|
|||
Install the Kubeflow Pipelines SDK:
|
||||
|
||||
```
|
||||
pip install https://storage.googleapis.com/ml-pipeline/release/0.1.2/kfp.tar.gz --upgrade
|
||||
pip install https://storage.googleapis.com/ml-pipeline/release/0.1.3-rc.2/kfp.tar.gz --upgrade
|
||||
```
|
||||
|
||||
## 2. Create a GKE cluster and install Kubeflow
|
||||
If you encounter any errors, run this before repeating the previous command:
|
||||
|
||||
Creating a cluster with click-to-deploy does not yet support the installation of
|
||||
pipelines. It is not useful for demonstrating pipelines, but is still worth showing.
|
||||
```
|
||||
pip uninstall kfp
|
||||
```
|
||||
|
||||
## 2. Create a GKE cluster and install Kubeflow with pipelines
|
||||
|
||||
Choose one of the following options for creating a cluster and installing
|
||||
Kubeflow with pipelines:
|
||||
|
||||
* Click-to-deploy
|
||||
* CLI (kfctl)
|
||||
|
||||
### Click-to-deploy
|
||||
|
||||
This is the recommended path if you do not require access to GKE beta features
|
||||
such as node auto-provisioning (NAP).
|
||||
|
||||
Generate a web app Client ID and Client Secret by following the instructions
|
||||
[here](https://www.kubeflow.org/docs/started/getting-started-gke/#create-oauth-client-credentials).
|
||||
Save these as environment variables for easy access.
|
||||
|
|
@ -58,11 +66,13 @@ In the [GCP Console](https://console.cloud.google.com/kubernetes), navigate to t
|
|||
Kubernetes Engine panel to watch the cluster creation process. This results in a
|
||||
full cluster with Kubeflow installed.
|
||||
|
||||
### kfctl
|
||||
### CLI (kfctl)
|
||||
|
||||
While node autoprovisioning is in beta, it must be enabled manually. To create
|
||||
a cluster with autoprovisioning, run the following commands, which will take
|
||||
around 30 minutes:
|
||||
If you require GKE beta features such as node autoprovisioning (NAP), these
|
||||
instructions describe manual cluster creation.
|
||||
|
||||
To create a cluster with autoprovisioning, run the following commands
|
||||
(estimated: 30 minutes):
|
||||
|
||||
```
|
||||
gcloud container clusters create ${CLUSTER} \
|
||||
|
|
@ -162,18 +172,6 @@ kfctl generate k8s
|
|||
kfctl apply k8s
|
||||
```
|
||||
|
||||
## 3. Install pipelines on GKE
|
||||
|
||||
```
|
||||
kubectl create clusterrolebinding sa-admin --clusterrole=cluster-admin --serviceaccount=kubeflow:pipeline-runner
|
||||
cd ks_app
|
||||
ks registry add ml-pipeline "${PIPELINES_REPO}/ml-pipeline"
|
||||
ks pkg install ml-pipeline/ml-pipeline
|
||||
ks generate ml-pipeline ml-pipeline
|
||||
ks param set ml-pipeline namespace kubeflow
|
||||
ks apply default -c ml-pipeline
|
||||
```
|
||||
|
||||
View the installed components in the GCP Console. In the
|
||||
[Kubernetes Engine](https://console.cloud.google.com/kubernetes)
|
||||
section, you will see a new cluster ${CLUSTER}. Under
|
||||
|
|
|
|||
|
|
@ -38,7 +38,7 @@ def kubeflow_training(
|
|||
num_layers: kfp.PipelineParam = kfp.PipelineParam(name='numlayers', value='2'),
|
||||
optimizer: kfp.PipelineParam = kfp.PipelineParam(name='optimizer', value='ftrl')):
|
||||
|
||||
training = training_op(learning_rate, num_layers, optimizer)
|
||||
training = training_op(learning_rate, num_layers, optimizer).set_gpu_limit(1)
|
||||
postprocessing = postprocessing_op(training.output) # pylint: disable=unused-variable
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
|||
|
|
@ -1,136 +0,0 @@
|
|||
apiVersion: argoproj.io/v1alpha1
|
||||
kind: Workflow
|
||||
metadata:
|
||||
generateName: pipeline-gpu-example-
|
||||
spec:
|
||||
arguments:
|
||||
parameters:
|
||||
- name: learning-rate
|
||||
value: '0.1'
|
||||
- name: num-layers
|
||||
value: '2'
|
||||
- name: optimizer
|
||||
value: ftrl
|
||||
entrypoint: pipeline-gpu-example
|
||||
serviceAccountName: pipeline-runner
|
||||
templates:
|
||||
- dag:
|
||||
tasks:
|
||||
- arguments:
|
||||
parameters:
|
||||
- name: training-output
|
||||
value: '{{tasks.training.outputs.parameters.training-output}}'
|
||||
dependencies:
|
||||
- training
|
||||
name: postprocessing
|
||||
template: postprocessing
|
||||
- arguments:
|
||||
parameters:
|
||||
- name: learning-rate
|
||||
value: '{{inputs.parameters.learning-rate}}'
|
||||
- name: num-layers
|
||||
value: '{{inputs.parameters.num-layers}}'
|
||||
- name: optimizer
|
||||
value: '{{inputs.parameters.optimizer}}'
|
||||
name: training
|
||||
template: training
|
||||
inputs:
|
||||
parameters:
|
||||
- name: learning-rate
|
||||
- name: num-layers
|
||||
- name: optimizer
|
||||
name: pipeline-gpu-example
|
||||
- container:
|
||||
args:
|
||||
- echo "{{inputs.parameters.training-output}}"
|
||||
command:
|
||||
- sh
|
||||
- -c
|
||||
image: library/bash:4.4.23
|
||||
inputs:
|
||||
parameters:
|
||||
- name: training-output
|
||||
name: postprocessing
|
||||
outputs:
|
||||
artifacts:
|
||||
- name: mlpipeline-ui-metadata
|
||||
path: /mlpipeline-ui-metadata.json
|
||||
s3:
|
||||
accessKeySecret:
|
||||
key: accesskey
|
||||
name: mlpipeline-minio-artifact
|
||||
bucket: mlpipeline
|
||||
endpoint: minio-service.kubeflow:9000
|
||||
insecure: true
|
||||
key: runs/{{workflow.uid}}/{{pod.name}}/mlpipeline-ui-metadata.tgz
|
||||
secretKeySecret:
|
||||
key: secretkey
|
||||
name: mlpipeline-minio-artifact
|
||||
- name: mlpipeline-metrics
|
||||
path: /mlpipeline-metrics.json
|
||||
s3:
|
||||
accessKeySecret:
|
||||
key: accesskey
|
||||
name: mlpipeline-minio-artifact
|
||||
bucket: mlpipeline
|
||||
endpoint: minio-service.kubeflow:9000
|
||||
insecure: true
|
||||
key: runs/{{workflow.uid}}/{{pod.name}}/mlpipeline-metrics.tgz
|
||||
secretKeySecret:
|
||||
key: secretkey
|
||||
name: mlpipeline-minio-artifact
|
||||
- container:
|
||||
args:
|
||||
- --batch-size
|
||||
- '64'
|
||||
- --lr
|
||||
- '{{inputs.parameters.learning-rate}}'
|
||||
- --num-layers
|
||||
- '{{inputs.parameters.num-layers}}'
|
||||
- --optimizer
|
||||
- '{{inputs.parameters.optimizer}}'
|
||||
command:
|
||||
- python
|
||||
- /mxnet/example/image-classification/train_mnist.py
|
||||
image: katib/mxnet-mnist-example
|
||||
resources:
|
||||
limits:
|
||||
nvidia.com/gpu: 1
|
||||
inputs:
|
||||
parameters:
|
||||
- name: learning-rate
|
||||
- name: num-layers
|
||||
- name: optimizer
|
||||
name: training
|
||||
outputs:
|
||||
artifacts:
|
||||
- name: mlpipeline-ui-metadata
|
||||
path: /mlpipeline-ui-metadata.json
|
||||
s3:
|
||||
accessKeySecret:
|
||||
key: accesskey
|
||||
name: mlpipeline-minio-artifact
|
||||
bucket: mlpipeline
|
||||
endpoint: minio-service.kubeflow:9000
|
||||
insecure: true
|
||||
key: runs/{{workflow.uid}}/{{pod.name}}/mlpipeline-ui-metadata.tgz
|
||||
secretKeySecret:
|
||||
key: secretkey
|
||||
name: mlpipeline-minio-artifact
|
||||
- name: mlpipeline-metrics
|
||||
path: /mlpipeline-metrics.json
|
||||
s3:
|
||||
accessKeySecret:
|
||||
key: accesskey
|
||||
name: mlpipeline-minio-artifact
|
||||
bucket: mlpipeline
|
||||
endpoint: minio-service.kubeflow:9000
|
||||
insecure: true
|
||||
key: runs/{{workflow.uid}}/{{pod.name}}/mlpipeline-metrics.tgz
|
||||
secretKeySecret:
|
||||
key: secretkey
|
||||
name: mlpipeline-minio-artifact
|
||||
parameters:
|
||||
- name: training-output
|
||||
valueFrom:
|
||||
path: /etc/timezone
|
||||
|
|
@ -6,7 +6,7 @@ export CLUSTER=kfdemo
|
|||
export PROJECT=${DEMO_PROJECT}
|
||||
export ENV=gke
|
||||
export SVC_ACCT=minikube
|
||||
export KUBEFLOW_TAG=v0.3.3
|
||||
export KUBEFLOW_TAG=v0.3.4-rc.1
|
||||
export KUBEFLOW_REPO=${HOME}/repos/kubeflow/kubeflow
|
||||
export DEMO_REPO=${HOME}/repos/kubeflow/examples/demos/yelp_demo
|
||||
|
||||
|
|
|
|||
|
|
@ -7,4 +7,5 @@ export CLUSTER=kubeflow
|
|||
export KUBEFLOW_REPO=${HOME}/repos/kubeflow/kubeflow
|
||||
export DEMO_REPO=${HOME}/repos/kubeflow/examples/demos/simple_pipeline
|
||||
export PIPELINES_REPO=${HOME}/repos/kubeflow/pipelines
|
||||
export PIPELINE_VERSION=0.1.3-rc.2
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue