159 lines
4.0 KiB
Plaintext
159 lines
4.0 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# TFX on KubeFlow Pipelines Example\n",
|
|
"\n",
|
|
"This notebook should be run inside a KF Pipelines cluster.\n",
|
|
"\n",
|
|
"### Install TFX and KFP packages"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"scrolled": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"!pip3 install 'tfx==0.15.0' --upgrade\n",
|
|
"!python3 -m pip install 'kfp>=0.1.35' --quiet"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Enable DataFlow API for your GKE cluster\n",
|
|
"<https://console.developers.google.com/apis/api/dataflow.googleapis.com/overview>\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Get the TFX repo with sample pipeline\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"scrolled": true,
|
|
"tags": [
|
|
"parameters"
|
|
]
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Directory and data locations (uses Google Cloud Storage).\n",
|
|
"import os\n",
|
|
"_input_bucket = '<your gcs bucket>'\n",
|
|
"_output_bucket = '<your gcs bucket>'\n",
|
|
"_pipeline_root = os.path.join(_output_bucket, 'tfx')\n",
|
|
"\n",
|
|
"# Google Cloud Platform project id to use when deploying this pipeline.\n",
|
|
"_project_id = '<your project id>'"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"scrolled": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# copy the trainer code to a storage bucket as the TFX pipeline will need that code file in GCS\n",
|
|
"from tensorflow.compat.v1 import gfile\n",
|
|
"gfile.Copy('examples/penguin/penguin_utils_cloud_tuner.py', _input_bucket + '/penguin_utils_cloud_tuner.py')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Configure the TFX pipeline example\n",
|
|
"\n",
|
|
"Reload this cell by running the load command to get the pipeline configuration file\n",
|
|
"```\n",
|
|
"%load https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/penguin/penguin_pipeline_kubeflow.py\n",
|
|
"```\n",
|
|
"\n",
|
|
"Configure:\n",
|
|
"- Set `_input_bucket` to the GCS directory where you've copied penguin_utils_cloud_tuner.py. I.e. gs://<my bucket>/<path>/\n",
|
|
"- Set `_output_bucket` to the GCS directory where you've want the results to be written\n",
|
|
"- Set GCP project ID (replace my-gcp-project). Note that it should be project ID, not project name.\n",
|
|
"\n",
|
|
"The dataset in BigQuery has 100M rows, you can change the query parameters in WHERE clause to limit the number of rows used.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"scrolled": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%load https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/penguin/penguin_pipeline_kubeflow.py"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Submit pipeline for execution on the Kubeflow cluster"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"scrolled": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import kfp\n",
|
|
"\n",
|
|
"run_result = kfp.Client(\n",
|
|
" host=None # replace with Kubeflow Pipelines endpoint if this notebook is run outside of the Kubeflow cluster.\n",
|
|
").create_run_from_pipeline_package('penguin_kubeflow.tar.gz', arguments={})"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.5.6"
|
|
},
|
|
"pycharm": {
|
|
"stem_cell": {
|
|
"cell_type": "raw",
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"source": []
|
|
}
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|