* add placeholder to spec
* add output_directory to pipeline
* respect uri placeholder in file outputs
* wip: add data passing rewriting logic to respect the uri semantics
* merge input_uri and paths when instantiating ContainerOp
* fix
* fix workflow rewriting
* Add topology rewriting
* add a test case, and various fixes
* make the test case more complex
* Fix the case when working with OpsGroup
* Fix test case
* fix resolving test
* fix redundant cmd lines
* fix redundant cmd lines
* resolve comments
* fix file outputs
* resolve comments
* copy file outputs instead of modifying inplace.
A recent PR has added changes architecturally belonged to a different module (the component bridge). This has introduced unintended dependencies and couplings between the modules. This PR restores the module separation. It also makes the code simpler.
* Compile IR proto in setup.py
* compile to IR
* Fix importer node logic and lint
* cleanup and lint
* merge, undo setup.py change
* cleanup and lint
* remove currently unused code
* format _component_bridge.py
* cleanup and format
* cleanup
* upgrade protobuf in test
* restructure and test
* address review comments
* fix bug
* avoid f-strings formatting
* address review comments
* address review comments
* limit the primitive types to only int, double, and string.
* Fix test for python3.5
* use instance_schema instead of schema_title
* add v2 to setup.py
* address review comments
* move the tests closer to the code
* add more tests
* cleanup and linting
* add more tests
* fix bug on input paramter connection
* linting
* restructure tests
* fix python3.5 test failure
* support outputs.parameters placeholder
* remove pipeline decorator from v2.dsl
Previously the process that was used to resolve a child task of a graph component was convoluted:
* Generate a dynamic task factory function for the child task component
* Convert input argument names from original to pythonic names
* Call the generated dynamic factory function using the python arguments to get back a task object
* Convert the task object outputs from pythonic back to original names (recently removed)
This PR significantly simplifies this process o just:
* Directly construct a task object based on the task component and the original arguments
* add tests for pythonic and non-pythonic component outputs
* fix: graph for non-pythonic container output's names
Loading container component from component.yaml creates both
pythonic and original output names. Graph component iterated over
all outputs, using pythonic-to-output conversion on all. If some
of the names are not identical to their pythonic versions, they
rised KeyError on the lookup table.
This commit fixes this problem by using default value for the lookup.
* remove depythonification of outputs - not needed anymore
* Prepare SDK docs environment so its easier to understand how to build the docs locally so theyre consistent with ReadTheDocs.
* Clean up docstrings for kfp.Client
* Add in updates to the docs for compiler and components
* Update components area to add in code references and make formatting a little more consistent.
* Clean up containers, add in custom CSS to ensure we do not overflow on inline code blocks
* Clean up containers, add in custom CSS to ensure we do not overflow on inline code blocks
* Remove unused kfp.notebook package links
* Clean up a few more errant references
* Clean up the DSL docs some more
* Update SDK docs for KFP extensions to follow Sphinx guidelines
* Clean up formatting of docstrings after Ark-Kuns comments
* SDK - Components - Calculate component hash digest
The digest is calculated when loading the component from URL, tfile or text.
Slightly refactored component loading - streams are no longer used, only bytes.
TODO: Calculate the digest if missing
TODO: Report possible digest conflicts
* Updated the test graph component
* Using the actual digest in the test
The PR is a refactoring.
Split all load_component* methods in _components and _component_store into _load_component_spec* and creating task factory from that spec.
This makes it easier to load the spec without having to create task factory functions.
* SDK - Components refactoring
This change is a pure refactoring of the implementation of component task creation.
For pipelines compiled using the DSL compiler (the compile() function or the command-line program) nothing should change.
The main goal of the refactoring is to change the way the component instantiation can be customized.
Previously, the flow was like this:
`ComponentSpec` + arguments --> `TaskSpec` --resolving+transform--> `ContainerOp`
This PR changes it to more direct path:
`ComponentSpec` + arguments --constructor--> `ContainerOp`
or
`ComponentSpec` + arguments --constructor--> `TaskSpec`
or
`ComponentSpec` + arguments --constructor--> `SomeCustomTask`
The original approach where the flow always passes through `TaskSpec` had some issues since TaskSpec only accepts string arguments (and two
other reference classes). This made it harder to handle custom types of arguments like PipelineParam or Channel.
Low-level refactoring changes:
Resolving of command-line argument placeholders has been extracted into a function usable by different task constructors.
Changed `_components._created_task_transformation_handler` to `_components._container_task_constructor`. Previously, the handler was receiving a `TaskSpec` instance. Now it receives `ComponentSpec` + arguments [+ `ComponentReference`].
Moved the `ContainerOp` construction handler setup to the `kfp.dsl.Pipeline` context class as planned.
Extracted `TaskSpec` creation to `_components._create_task_spec_from_component_and_arguments`.
Refactored `_dsl_bridge.create_container_op_from_task` to `_components._resolve_command_line_and_paths` which returns `_ResolvedCommandLineAndPaths`.
Renamed `_dsl_bridge._create_container_op_from_resolved_task` to `_dsl_bridge._create_container_op_from_component_and_arguments`.
The signature of `_components._resolve_graph_task` was changed and it now returns `_ResolvedGraphTask` instead of modified `TaskSpec`.
Some of the component tests still expect ContainerOp and its attributes.
These tests will be changed later.
* Adapted the _python_op tests
* Fixed linter failure
I do not want to add any top-level kfp imports in this file to prevent circular references.
* Added docstrings
* FIxed the return type forward reference
This makes the graph input references consistent with task output references.
This is a breaking change, but the graph components are not exposed in the documentation or samples yet.
Fixed accessing inputs and outputs without checking for None.
Fixed case where the default value of graph component input has to be passed to component as an argument.
Problem: It's hard to distinguish components loaded by name (e.g. using `ComponentStore`) from components that were never loaded (e.g. just created from python function).
`component_ref.name` was previously being set, since it was a required parameter.
`component_ref.name` should only be set if component was loaded by name.
* SDK - Components - Added type to TaskOutputReference
Now the task output references taken from TaskSpec instances can be
type-checked when passed to components.
* Renamed TypeType to TypeSpecType
Problem: When the user loads component using the load_component function, the object they get back is a task factory function. Since it's a normal function object, the user cannot inspect any of the attributes of the component they just loaded (they can only see the name, description and input names). For example, the user cannot see the list of component outputs, the annotations etc.
This change fixes the issue by adding the original component properties to the function object.
Example usage:
```python
train_op = load_component_from_url(...)
print(train_op.outputs)
```
* SDK - Components - Improved serialization and deserialization of arguments and defaults
Properly serialize default values and passed arguments using the same code.
Check the types of passed argument values and issue warnings.
Improved argument reference type compatibility checking. When types do not match there is always either error or warning.
When creating component from python function, the input types are now canonicalized.
* Addressed the feedback
* SDK - Refactoring - Replaced the TypeMeta class
The PipelineParam no longer exposes the private TypeMeta class
Fixes#1420
The refactoring PR is part of a series of PR which unifies the metadata and specification types.
* add core types and type checking function
* fix unit test bug
* avoid defining dynamic classes
* typo fix
* add component metadata format
* add a construct for the component decorator
* add default values for the meta classes
* add input/output types to the metadata
* add from_dict in TypeMeta
* small fix
* add unit tests
* use python struct for the openapi schema
* add default in parameter
* add default value
* remove the str restriction for the param default
* bug fix
* add pipelinemeta
* add pipeline metadata
* ignore annotation if it is not str/BaseType/dict
* update param name in the check_type functions
remove schema validators for GCRPath, and adjust for GCRPath, GCSPath
change _check_valid_dict to _check_valid_type_dict to avoid confusion
fix typo in the comments
adjust function order for readability
* remove default values for non-primitive types in the function signature
update the _check_valid_type_dict name
* pass metadata from component decorator and task factory to containerOp
* pass pipeline metadata to Pipeline
* fix unit test
* typo in the comments
* move the metadata classes to a separate module
* fix unit test
* small change
* add __eq__ to meta classes
not export _metadata classes
* nothing
* fix unit test
* unit test python component
* unit test python pipeline
* fix bug: duplicate variable of args
* fix unit tests
* move python_component and _component decorator in _component file
* remove the print
* change parameter default value to None
* add functools wraps around _component decorator
* TypeMeta accept both str and dict
* fix indent, add unit test for type as strings
* do not set default value for the name field in ParameterMeta, ComponentMeta, and PipelineMeta
* add type check in task factory
* output error message
* add type check in component decorator; move the metadata assignment out of the containerop __init__ function
* fix bug; add unit test
* add more unit tests
* more unit tests; fix bugs
* more unit tests; fix bugs
* add unit tests
* more unit tests
* add type check switch; add unit tests
* add compiler option for type check
* resolving pr comments
* add unit test for pipeline param check with component types; fix the bug; also fix the bug when there are not a single return annotations
The zip-packed components are supported in all load_component APIs:
`kfp.components.load_component`
`kfp.components.load_component_from_file`
`kfp.components.load_component_from_url`
`kfp.components.ComponentStore.load_component`
This is needed for the future storage system based on volume mounts:
If outputs were written to files in the same dir (e.g. /outputs/out1.txt and /outputs/out2.txt), then we cannot separate them and mount to the downstream task containers independently.
* Reworked the Component structures.
Rewrote parsing, type checking and serialization code.
Improved the graph component structures.
Added most of the needed k8s structures.
Added model validation (input/output existence etc).
Added task cycle detection and topological sorting to GraphSpec.
All container component tests now work.
Added some graph component tests.
* Fixed incompatibilities with python <3.7
* Added __init__.py to make the Travis tests work.
* Adding kubernetes structures to setup.py
* Addressed PR feedback: Renamed _original_names to _serialized_names
* Addressed PR feedback: Reduced indentation.
* Added descriptions for all component structures.
* Fixed a bug in ComponentSpec._post_init()
* Added documentation for ModelBase class and functions.
* Added __eq__/__ne__ and improved __repr__
* Added ModelBase tests
* Support replacable arguments in command as well (besides arguments) in container op.
* Fix components builder.
* Fix tests.
* Follow up CR comments.
* Fix test.