* chore(sdk): move v1 to deprecated * fix testsg * fix testsg * fix setup.py * fix test * s' * fix tests * fix test * retore v2 test changes * fix py3.6 test * fix py3.6 test * fix py3.6 import fallback error * remove deprecated * fix samples test * sample test * fix samples * add readme * restroe test * python require * remove path * fix tests * inteegration tests * remove tfx tests for dependency with kfp v1 * fix e2e * fix e2e * fix integration tests * fix sampe * move client down * change to kfp * add import alias * fix * runid * fix dsl * only use kfp for function * revert train_until_good * tfx test * kfp * try import * onprem |
||
|---|---|---|
| .. | ||
| README.md | ||
| check_permission.png | ||
| parameterized_tfx_oss.py | ||
| parameterized_tfx_oss_test.py | ||
| taxi_pipeline_notebook.ipynb | ||
README.md
Overview
Tensorflow Extended (TFX) is a Google-production-scale machine learning platform based on TensorFlow. It provides a configuration framework to express ML pipelines consisting of TFX components. Kubeflow Pipelines can be used as the orchestrator supporting the execution of a TFX pipeline.
This directory contains two samples that demonstrate how to author a ML pipeline in TFX and run it on a KFP deployment.
parameterized_tfx_oss.pyis a Python script that outputs a compiled KFP workflow, which you can submit to a KFP deployment to run;parameterized_tfx_oss.ipynbis a notebook version ofparameterized_tfx_oss.py, and it also includes the guidance to setup its dependencies.
Please refer to inline comments for the purpose of each step in both samples.
Compilation
-
parameterized_tfx_oss.py: In order to successfully compile the Python sample, you'll need to have a TFX installation at version 1.0.0 by runningpython3 -m pip install tfx==1.0.0. After that, under the sample dir runpython3 parameterized_tfx_oss.pyto compile the TFX pipeline into KFP pipeline package. The compilation is done by invokingkfp_runner.run(pipeline)in the script. -
parameterized_tfx_oss.ipynb: The notebook sample includes the installation of various dependencies as its first step.
Permission
⚠️ If you are using full-scope or workload identity enabled cluster in hosted pipeline beta version, DO NOT follow this section. However you'll still need to enable corresponding GCP API.
This pipeline requires Google Cloud Storage permission to run.
If KFP was deployed through K8S marketplace, please follow instructions in
the guideline
to make sure the service account has storage.admin role.
If KFP was deployed through
standalone deployment
please refer to Authenticating Pipelines to GCP
to provide storage.admin permission.
Execution
-
parameterized_tfx_oss.py: You can submit the compiled package to a KFP deployment and run it from the UI. -
parameterized_tfx_oss.ipynb: The last step of the notebook the execution of the pipeline is invoked via KFP SDK client. Also you have the option to submit and run from UI manually.
Known issues
- This approach only works for string-typed quantities. For example, you cannot parameterize
num_stepsofTrainerin this way. - Name of parameters should be unique.
- By default pipeline root is always parameterized with the name
pipeline-root. - If the parameter is referenced at multiple places, the user should
make sure that it is correctly converted to the string-formatted placeholder by
calling
str(your_param). - The best practice is to specify TFX pipeline root to an empty dir. In this sample Argo
automatically do that by plugging in the
workflow unique ID (represented
kfp.dsl.RUN_ID_PLACEHOLDER) to the pipeline root path.