* Components - Added support for Dataflow in TFX components
To use Dataflow, pass beam_pipeline_args to a component.
```
transformer_op(
...,
beam_pipeline_args = [
'--runner=DataflowRunner',
'--experiments=shuffle_mode=auto',
'--project=' + project_id,
'--temp_location=' + gcs_bucket + '/tmp'),
'--region=' + gcp_region,
'--disk_size_gb=50',
],
)
```
These components use URI-based I/O since TFX with Beam's DataflowRunner only supports GCS URIs for inputs and outputs. With URI-based IO, the user must specify all output URIs themselves (e.g. `CsvEampleGen(..., output_examples_uri=...)`). Do not forget to do so. The `kfp.dsl.EXECUTION_ID_PLACEHOLDER` object can help construct execution-unique URIs, but if the component has multiple URIs, you will need to add some prefixes that are different for each output.
There is a bug in TFX+Beam which prevents using DataflowRunner, but these componenct contain a workaround. The workaround can be removed when the fixed verson of TFX is released ddb01c0242
* Added the TFX on KFP Dataflow sample
* Updated the README.md file
* Enabled the blessing output of the Evaluator
The Evaluator does not always write to that URI, but for components with URI-based I/O this does not matter.
* Fixed the indent in YAML
* Addressed the review feedback
* Updated the sample after the component changes
* Fixed the Dataflow casing in the sample name
* Using channel_utils.unwrap_channel_dict
* Updated the sample pipeline
* Sjortened the .get expressions
* Updated the sample
* Updated and synced the generated code
There is only 1 line of component specific code in each component function (apart frm the sunction signature).
* Updated some components that had older version of the generated code. The generated code is now the same everywhere.
* `input_channels_with_splits` is now generated based on the input artifact types
* TFX broke back compat: Removed `.split` from the artifacts. The components seem to now assume there is a single artifact in the channel.
* TFX broke back compat: changed the way artifact instances are created
* Updated container image to 0.21.4. There might have been backwards incompatible input/output changes - need to check and update.
* Updated component signatures
* Updated the generated component.yaml files
* Updated the sample notebook notebook
* Removed the optional output in Evaluator
Optional outputs are not supported yet. I'm not sure they're even correct according to MLMD.
* Updated the sample
* Added CsvExampleGen component
* Switched to using some processing code from the component class
Needs testing
* Renamed output_examples to example_artifacts for consistency with the original component
* Fixed the docstring a bit
* Added StatisticsGen
First draft
* Added SchemaGen
First draft
* Fixed the input_dict construction
* Use None defaults
* Switched to TFX container image
* Updated component definitions
* Fixed StatisticsGen and SchemaGen
Input artifacts must have splits.
Split URIs should end with "/'.
The ciomponents now work.
Also printing component_class_instance for debugging.
* Printing component instance in CsvExampleGen
* Moved components to directories
* Updated the sample TFX pipeline
* Renamed ExamplesPath to Examples for data passing components
* Corrected output_component_file paths
* Added the Transform component
The component uses almost completely generic code.
* Added the Trainer component
* Added the Evaluator component
* Added the ExampleValidator component
* Added the BigQueryExampleGen component
* Added the ImportExampleGen component
* Updated the sample
Added ExampleValidator, Transform, Trainer, Evaluator
* Upgraded to TFX 0.15.0
* Upgraded the sample to 0.15.0
* Silence Flake8 for annotations