pipelines/samples/tfx
Ning de7f68d129 Release components (#1347)
* Updated component images to version 6554e133dd

* Updated components to version 74d8e59217
2019-05-17 17:45:08 -07:00
..
taxi-cab-classification change tfma sample name to tfx 2018-11-05 10:39:37 -08:00
README.md Pointed doc links to Kubeflow website instead of wiki. (#398) 2018-11-27 17:59:49 -08:00
taxi-cab-classification-pipeline.py Release components (#1347) 2019-05-17 17:45:08 -07:00

README.md

The taxi-cab-classification-pipeline.py sample runs a pipeline with TensorFlow's transform and model-analysis components.

The dataset

This sample is based on the model-analysis example here.

The sample trains and analyzes a model based on the Taxi Trips dataset released by the City of Chicago.

Note: This site provides applications using data that has been modified for use from its original source, www.cityofchicago.org, the official website of the City of Chicago. The City of Chicago makes no claims as to the content, accuracy, timeliness, or completeness of any of the data provided at this site. The data provided at this site is subject to change at any time. It is understood that the data provided at this site is being used at ones own risk.

Read more about the dataset in Google BigQuery. Explore the full dataset in the BigQuery UI.

Requirements

Preprocessing and model analysis use Apache Beam.

When run with the cloud mode (instead of the local mode), those steps use Google Cloud DataFlow for running the Beam pipelines.

Therefore, you must enable the DataFlow API for the given GCP project if you want to use cloud as the mode for either preprocessing or analysis. See the guide to enabling the DataFlow API.

Compiling the pipeline template

Follow the guide to building a pipeline to install the Kubeflow Pipelines SDK, then run the following command to compile the sample Python into a workflow specification. The specification takes the form of a YAML file compressed into a .tar.gz file.

dsl-compile --py taxi-cab-classification-pipeline.py --output taxi-cab-classification-pipeline.tar.gz

Deploying the pipeline

Open the Kubeflow pipelines UI. Create a new pipeline, and then upload the compiled specification (.tar.gz file) as a new pipeline template.

The pipeline requires two arguments:

  1. The name of a GCP project.
  2. An output directory in a Google Cloud Storage bucket, of the form gs://<BUCKET>/<PATH>.

Components source

Preprocessing: source code container

Training: source code container

Analysis: source code container

Prediction: source code container