Commit Graph

3 Commits

Author SHA1 Message Date
Alexey Volkov fa3b3043c6
Components - Added support for Dataflow in TFX components (#3684)
* Components - Added support for Dataflow in TFX components

To use Dataflow, pass beam_pipeline_args to a component.
```
transformer_op(
    ...,
    beam_pipeline_args = [
        '--runner=DataflowRunner',
        '--experiments=shuffle_mode=auto',
        '--project=' + project_id,
        '--temp_location=' + gcs_bucket + '/tmp'),
        '--region=' + gcp_region,
        '--disk_size_gb=50',
    ],
)
```

These components use URI-based I/O since TFX with Beam's DataflowRunner only supports GCS URIs for inputs and outputs. With URI-based IO, the user must specify all output URIs themselves (e.g. `CsvEampleGen(..., output_examples_uri=...)`). Do not forget to do so. The `kfp.dsl.EXECUTION_ID_PLACEHOLDER` object can help construct execution-unique URIs, but if the component has multiple URIs, you will need to add some prefixes that are different for each output.

There is a bug in TFX+Beam which prevents using DataflowRunner, but these componenct contain a workaround. The workaround can be removed when the fixed verson of TFX is released ddb01c0242

* Added the TFX on KFP Dataflow sample

* Updated the README.md file

* Enabled the blessing output of the Evaluator

The Evaluator does not always write to that URI, but for components with URI-based I/O this does not matter.

* Fixed the indent in YAML

* Addressed the review feedback

* Updated the sample after the component changes

* Fixed the Dataflow casing in the sample name

* Using channel_utils.unwrap_channel_dict

* Updated the sample pipeline

* Sjortened the .get expressions

* Updated the sample
2020-05-06 13:37:08 -07:00
Alexey Volkov c1ab0010be
Components - Upgraded the TFX components to 0.21.4 (#3641)
* Updated and synced the generated code

There is only 1 line of component specific code in each component function (apart frm the sunction signature).

* Updated some components that had older version of the generated code. The generated code is now the same everywhere.
* `input_channels_with_splits` is now generated based on the input artifact types
* TFX broke back compat: Removed `.split` from the artifacts. The components seem to now assume there is a single artifact in the channel.
* TFX broke back compat: changed the way artifact instances are created
* Updated container image to 0.21.4. There might have been backwards incompatible input/output changes - need to check and update.

* Updated component signatures

* Updated the generated component.yaml files

* Updated the sample notebook notebook

* Removed the optional output in Evaluator

Optional outputs are not supported yet. I'm not sure they're even correct according to MLMD.

* Updated the sample
2020-04-29 01:40:24 -07:00
Alexey Volkov b63472062b Components - TFX (#2671)
* Added CsvExampleGen component

* Switched to using some processing code from the component class

Needs testing

* Renamed output_examples to example_artifacts for consistency with the original component

* Fixed the docstring a bit

* Added StatisticsGen

First draft

* Added SchemaGen

First draft

* Fixed the input_dict construction

* Use None defaults

* Switched to TFX container image

* Updated component definitions

* Fixed StatisticsGen and SchemaGen

Input artifacts must have splits.
Split URIs should end with "/'.
The ciomponents now work.

Also printing component_class_instance for debugging.

* Printing component instance in CsvExampleGen

* Moved components to directories

* Updated the sample TFX pipeline

* Renamed ExamplesPath to Examples for data passing components

* Corrected output_component_file paths

* Added the Transform component

The component uses almost completely generic code.

* Added the Trainer component

* Added the Evaluator component

* Added the ExampleValidator component

* Added the BigQueryExampleGen component

* Added the ImportExampleGen component

* Updated the sample

Added ExampleValidator, Transform, Trainer, Evaluator

* Upgraded to TFX 0.15.0

* Upgraded the sample to 0.15.0

* Silence Flake8 for annotations
2019-12-04 17:52:31 -08:00