This PR fixes a bug in AirFlow op creation.
The `_run_airflow_op` helper function was not captured along with the `_run_airflow_op_closure` function, because they belong to different modules (`_run_airflow_op_closure` was module-less).
This was not discovered during the notebook testing of the code since in that environment the `_run_airflow_op` was also module-less as it was defined in a notebook (not in .py file).
* Lint Python code for undefined names
* Lint Python code for undefined names
* Exclude tfdv.py to workaround an overzealous pytest
* Fixup for tfdv.py
* Fixup for tfdv.py
* Fixup for tfdv.py
* SDK - Lightweight - Added support for "None" default values
Previously it was impossible to pass None to components since it was being converted to the string "None".
* is_required = not input.optional for now
As asked by @gaoning777
It's required to correctly handle None arguments or None default values (also needed for optional and variable-number inputs).
It's easier to understand and generates better command-line code.
* SDK - Refactored the code in kfp.components._python_op._capture_function_code_using_cloudpickle
* SDK/Lightweight - Added python version compatibility checks
See my compatibility analysis: https://github.com/cloudpipe/cloudpickle/issues/293
I've introduced code pickling to capture dependencies in https://github.com/kubeflow/pipelines/pull/1372
Later I've discovered that there is a serious opcode incompatibility between python versions 3.5 and 3.6+. See my analysis of the issue: https://github.com/cloudpipe/cloudpickle/issues/293
Dues to this issue I decided to switch back to using source code copying by default and to continue improving it.
Until we stop supporting python 3.5 (https://github.com/kubeflow/pipelines/pull/668) it's too dangerous to use code pickling by default.
Code pickling can be enabled by specifying `pickle_code=True` when calling `func_to_container_op`
Due to its nature, Argo will replace any strings it encounters
that are enclosed in double curly braces, which will make the code
non-executable. To workaround this, the code is encoded in the Argo
yaml template and decoded on the fly, before the execution.
* SDK - Controlling which modules are captured with Lightweight components
All func_to_* functions now accept the modules_to_capture parameter: List of module names that will be captured (instead of just referencing) during the dependency scan. By default the func.__module__ is captured.
* Described the behavior more in depth.
* Added a test to check that only dependencies are captured
* Transitively capturing code dependencies
Using cloudpickle.
* Got rid of func_type_declarations_code variable
* Extracted the function code extraction functions
* Improved support for capturing module-level dependencies
* Added test for capturing module-level dependencies
* Removed the _capture_function_code_using_source_copy function
As requested by Ning
* SDK - Made ComponentSpec.implementation field optional
Improved the error message when trying to convert tasks to ContainerOp.
* Switched from attribute checking to type checking
* Feature: sidecar for ContainerOp
* replace f-string with string format for compatibility with py3.5
* ContainerOp now can be updated with any k8s V1Container attributes as well as sidecars with Sidecar class. ContainerOp accepts PipelineParam in any valid k8 properties.
* WIP: fix conflicts and bugs with recent master. TODO: more complex template with pipeline params
* fix proxy args
* Fixed to work with latest master head
* Added container_kwargs to ContainerOp to pass in k8s container kwargs
* Fix comment bug, updated with example in ContainerOp docstring
* fix copyright year
* expose match_serialized_pipelineparam as public for compiler to process serialized pipeline params
* fixed pydoc example and removed unnecessary ContainerOp.container.parent
* Fix conflicts in compiler tests
* add core types and type checking function
* fix unit test bug
* avoid defining dynamic classes
* typo fix
* add component metadata format
* add a construct for the component decorator
* add default values for the meta classes
* add input/output types to the metadata
* add from_dict in TypeMeta
* small fix
* add unit tests
* use python struct for the openapi schema
* add default in parameter
* add default value
* remove the str restriction for the param default
* bug fix
* add pipelinemeta
* add pipeline metadata
* ignore annotation if it is not str/BaseType/dict
* update param name in the check_type functions
remove schema validators for GCRPath, and adjust for GCRPath, GCSPath
change _check_valid_dict to _check_valid_type_dict to avoid confusion
fix typo in the comments
adjust function order for readability
* remove default values for non-primitive types in the function signature
update the _check_valid_type_dict name
* pass metadata from component decorator and task factory to containerOp
* pass pipeline metadata to Pipeline
* fix unit test
* typo in the comments
* move the metadata classes to a separate module
* fix unit test
* small change
* add __eq__ to meta classes
not export _metadata classes
* nothing
* fix unit test
* unit test python component
* unit test python pipeline
* fix bug: duplicate variable of args
* fix unit tests
* move python_component and _component decorator in _component file
* remove the print
* change parameter default value to None
* add functools wraps around _component decorator
* TypeMeta accept both str and dict
* fix indent, add unit test for type as strings
* do not set default value for the name field in ParameterMeta, ComponentMeta, and PipelineMeta
* add type check in task factory
* output error message
* add type check in component decorator; move the metadata assignment out of the containerop __init__ function
* fix bug; add unit test
* add more unit tests
* more unit tests; fix bugs
* more unit tests; fix bugs
* add unit tests
* more unit tests
* add type check switch; add unit tests
* add compiler option for type check
* resolving pr comments
* add unit test for pipeline param check with component types; fix the bug; also fix the bug when there are not a single return annotations
The zip-packed components are supported in all load_component APIs:
`kfp.components.load_component`
`kfp.components.load_component_from_file`
`kfp.components.load_component_from_url`
`kfp.components.ComponentStore.load_component`
When the DSL bridge code was written, ContainerOp did not support env, so we did not pass it. Now we're adding the passing code.
Added test that chacks that the env variables get to the ContainerOp.
Ultimately, command line is an array of strings. Component yaml files should have the arguments as strings instead of Python SDK doing conversion sometimes.
This is needed for the future storage system based on volume mounts:
If outputs were written to files in the same dir (e.g. /outputs/out1.txt and /outputs/out2.txt), then we cannot separate them and mount to the downstream task containers independently.