* SDK - Components - Fixed python components that use \n
The escape sequence was being replaced by the `echo` command.
Apparently, unlike in the `bash` shell, the `echo` command of the `sh` shell expands the escape sequences by default and does not support an option to turn it off. (For some reason the -n option works properly even though it should not).
Fixes https://github.com/kubeflow/pipelines/issues/4939
* Fixed the test data
* Fixed the deprecated container component builder
* Fixed the new compiler test case
* Added test
The component specification has always supported component annotations, but there was no way to specify them for the components generated from python. This PR fixes that.
* add placeholder to spec
* add output_directory to pipeline
* respect uri placeholder in file outputs
* wip: add data passing rewriting logic to respect the uri semantics
* merge input_uri and paths when instantiating ContainerOp
* fix
* fix workflow rewriting
* Add topology rewriting
* add a test case, and various fixes
* make the test case more complex
* Fix the case when working with OpsGroup
* Fix test case
* fix resolving test
* fix redundant cmd lines
* fix redundant cmd lines
* resolve comments
* fix file outputs
* resolve comments
* copy file outputs instead of modifying inplace.
Currently were running the python code inline using `python -c <code>`.
This has two issues:
1) Python does not show source code line in exception stack traces
2) inspect.getsource does not work. This method is used in PyTorch JIT for example.
We solve these issues by writing the code into a file before executing it.
The disadvantage of the new approach is that it adds complexity, a filesystem write operation and also requires the `sh` executable to be present (we could replace it with python-based program if needed).
A recent PR has added changes architecturally belonged to a different module (the component bridge). This has introduced unintended dependencies and couplings between the modules. This PR restores the module separation. It also makes the code simpler.
* Compile IR proto in setup.py
* compile to IR
* Fix importer node logic and lint
* cleanup and lint
* merge, undo setup.py change
* cleanup and lint
* remove currently unused code
* format _component_bridge.py
* cleanup and format
* cleanup
* upgrade protobuf in test
* restructure and test
* address review comments
* fix bug
* avoid f-strings formatting
* address review comments
* address review comments
* limit the primitive types to only int, double, and string.
* Fix test for python3.5
* use instance_schema instead of schema_title
* add v2 to setup.py
* address review comments
* move the tests closer to the code
* add more tests
* cleanup and linting
* add more tests
* fix bug on input paramter connection
* linting
* restructure tests
* fix python3.5 test failure
* support outputs.parameters placeholder
* remove pipeline decorator from v2.dsl
Previously the process that was used to resolve a child task of a graph component was convoluted:
* Generate a dynamic task factory function for the child task component
* Convert input argument names from original to pythonic names
* Call the generated dynamic factory function using the python arguments to get back a task object
* Convert the task object outputs from pythonic back to original names (recently removed)
This PR significantly simplifies this process o just:
* Directly construct a task object based on the task component and the original arguments
* SDK - Components - Added Bool as a known type name
Some components are already using this type name and are starting to fail due to more strict type checking during constant argument serialization.
* Fixed syntax error
* add tests for pythonic and non-pythonic component outputs
* fix: graph for non-pythonic container output's names
Loading container component from component.yaml creates both
pythonic and original output names. Graph component iterated over
all outputs, using pythonic-to-output conversion on all. If some
of the names are not identical to their pythonic versions, they
rised KeyError on the lookup table.
This commit fixes this problem by using default value for the lookup.
* remove depythonification of outputs - not needed anymore
* Prepare SDK docs environment so its easier to understand how to build the docs locally so theyre consistent with ReadTheDocs.
* Clean up docstrings for kfp.Client
* Add in updates to the docs for compiler and components
* Update components area to add in code references and make formatting a little more consistent.
* Clean up containers, add in custom CSS to ensure we do not overflow on inline code blocks
* Clean up containers, add in custom CSS to ensure we do not overflow on inline code blocks
* Remove unused kfp.notebook package links
* Clean up a few more errant references
* Clean up the DSL docs some more
* Update SDK docs for KFP extensions to follow Sphinx guidelines
* Clean up formatting of docstrings after Ark-Kuns comments
* SDK - Components - Replaced Kubernetes options with generic launcher options
This reduces the schema size and makes the task launcher options more flexible.
* Removed the launcherOptions for now
Previously the default image was set to an old version of tensorflow image. That image is now outdated. It's also framework-specific and pretty big.
We're switching to the official python image which is small, official and framework-agnostic.
The users can easily switch to the old behavior by just specifying `base_image='tensorflow/tensorflow:1.13.2-py3'` during the component creation.
* SDK - Components - Stabilize JSON serialization by sorting keys
Otherwise serialization of the default values of the component/pipeline inputs is unstable on Python 3.5.
* Fixed the test data
In some cases the input and output names need to be converted (for example, the input names need to be converted to python function parameter names).
With naive renaming, multiple inputs might be mapped to the same parameter name in some edge cases. The `generate_unique_name_conversion_table` creates a correct mapping.
However, in some really rare cases the resulting mapping could be confusing since it might rename an input whose name was already a correct parameter name and map a different input name to that parameter. E.g. {'AAA' -> 'aaa', 'aaa' -> 'aaa_2'}.
This PR fixes that. Names that do not change when applying the conversion_func will remain unchanged in the mapping. {'AAA' -> 'aaa_2', 'aaa' -> 'aaa'}.
* SDK - Components - Calculate component hash digest
The digest is calculated when loading the component from URL, tfile or text.
Slightly refactored component loading - streams are no longer used, only bytes.
TODO: Calculate the digest if missing
TODO: Report possible digest conflicts
* Updated the test graph component
* Using the actual digest in the test
* SDK - Prioritize lib2to3 when stripping type annotations
It's a standard python library (although not well supported) and it doe not leave training spaces.
* Fixed compiler test data
The PR is a refactoring.
Split all load_component* methods in _components and _component_store into _load_component_spec* and creating task factory from that spec.
This makes it easier to load the spec without having to create task factory functions.