Commit Graph

56 Commits

Author SHA1 Message Date
Alexey Volkov b9aa106bb5
SDK - Prioritize lib2to3 when stripping type annotations (#3724)
* SDK - Prioritize lib2to3 when stripping type annotations

It's a standard python library (although not well supported) and it doe not leave training spaces.

* Fixed compiler test data
2020-05-11 18:44:20 -07:00
Alexey Volkov 5ff7a65a0c
SDK - Components - Fixed bug in _strip_type_hints_using_lib2to3 (#3679) 2020-05-04 22:41:08 -07:00
Alexey Volkov 9619655ed5
SDK - Enabled file inputs to be optional (#3620)
* SDK - Enabled file inputs to be optional

* Added unit tests
2020-04-27 19:34:04 -07:00
Alexey Volkov be12ccf2a1
SDK - Moved the @python_component decorator test to dsl tests (#3324)
* SDK - Moved the @python_component decorator test to dsl tests

* Deprecate @python_component
2020-03-21 08:14:43 -07:00
Alexey Volkov 119e329108
SDK - Components - Fixed handling collection return values (#3263)
* SDK - Components - Fixed handling collection return values

Fixes https://github.com/kubeflow/pipelines/issues/3262

* Fixed the tests
2020-03-12 23:50:39 -07:00
Alexey Volkov 578d8de91d
SDK - Reduce python component limitations - no import errors for cust… (#3106)
* SDK - Reduce python component limitations - no import errors for custom type annotations

By default, create_component_from_func copies the source code of the function and creates a component using that source code. No global imports are captured. This is problematic for the function definition, since any annotation, that uses a type that needs to be imported, will cause error. There were some special provisions for
NamedTuple,  InputPath and OutputPath, but even they were brittle (for example, "typing.NamedTuple" or "components.InputPath" annotations still caused failures at runtime).

This commit fixes the issue by stripping the type annotations from function declarations.

Fixes cases that were failing before:

```python
import typing
import collections

MyFuncOutputs = typing.NamedTuple('Outputs', [('sum', int), ('product', int)])

@create_component_from_func
def my_func(
    param1: CustomType,  # This caused failure previously
    param2: collections.OrderedDict,  # This caused failure previously
) -> MyFuncOutputs: # This caused failure previously
    pass
```

* Fixed the compiler tests

* Fixed crashes on print function

Code `print(line, end="")` was causing error: "lib2to3.pgen2.parse.ParseError: bad input: type=22, value='=', context=('', (2, 15))"

* Using the strip_hints library to strip the annotations

* Updating test workflow yamls

* Workaround for bug in untokenize

* Switched to the new strip_string_to_string method

* Fixed typo.

Co-Authored-By: Jiaxiao Zheng <jxzheng@google.com>

Co-authored-by: Jiaxiao Zheng <jxzheng@google.com>
2020-02-24 20:50:48 -08:00
Alexey Volkov 7ee3244f5b
SDK - Components - Fixed dict-style type annotations (#3107)
Refactored `_data_passing.py` interface to expose functions instead of dictionaries.
2020-02-18 20:40:25 -08:00
Alexey Volkov c83aff2738
SDK - Components - Made it easier to access component spec classes (#2860)
* SDK - Components - Made it easier to access component spec classes

* Updated the imports
2020-01-31 11:41:21 -08:00
Alexey Volkov 6c72cc874a SDK - Components - Added the create_component_from_func alias (#2911)
Added the `create_component_from_func` function as alias for `func_to_container_op`.
It behaves exactly the same, but the name now does not imply that you'll always get `ContainerOp` from it.
Some function parameters are not added at this moment as they're not widely used and might be deprecated in the future.
2020-01-27 17:41:38 -08:00
Alexey Volkov 27f7e77356 SDK - Unified the function signature parsing implementations (#2689)
* Replaced `_instance_to_dict(obj)` with `obj.to_dict()`

* Fixed the capitalization in _python_function_name_to_component_name
It now only changes the case of the first letter.

* Replaced the _extract_component_metadata function with _extract_component_interface

* Stopped adding newline to the component description.

* Handling None inputs and outputs

* Not including emply inputs and outputs in component spec

* Renamed the private attributes that the @pipeline decorator sets

* Changged _extract_pipeline_metadata to use _extract_component_interface

* Fixed issues based on feedback
2019-12-27 10:05:40 -08:00
Alexey Volkov 605ef804c6 Fixed the capitalization in _python_function_name_to_component_name (#2688)
It now only changes the case of the first letter.
2019-12-10 12:00:11 -08:00
Alexey Volkov 61506a0e88 SDK - Components - Fixed YAML formatting for some components (#2529)
* SDK - Components - Fixed YAML formatting for some components

This fixes formatting for components where function does not have a return annotation.
The low-level cause of issue: Trailing whitespace when there are no serializers.
Trailing whitespace triggers ugly YAML string formatting.

* Addressed feedback
2019-11-07 14:48:19 -08:00
Alexey Volkov 1282f16335 SDK - Python components - Fixed bug when mixing file outputs with return value outputs (#2473) 2019-10-23 19:45:05 -07:00
Alexey Volkov f4d689b4ed SDK - Python components - Fixed handling multiline decorators (#2345)
* SDK - Python components - Fixed handling multiline decorators

* Switched to using dedent

* Added error checking

* Testing multiline decorator

* Test calling the component created from decorated function

Also fixed `helper_test_component_against_func_using_local_call`.
2019-10-16 12:17:29 -07:00
Alexey Volkov 8b0cb8a5b5 SDK - Components - Deprecate the get and set methods for default image in favor of plain variable (#2257) 2019-10-04 15:35:12 -07:00
Alexey Volkov b2f1d0071f SDK - Components - Added the ComponentSpec.save method (#2264)
* SDK - Components - Added the ComponentSpec.save method

* Fixed write call
2019-10-03 15:25:55 -07:00
Alexey Volkov c676b838ef SDK - Lightweight - Added package installation support to func_to_container_op (#2245)
* SDK - Refactoring - Passing the parameters explicitly in python_op.
This helps avoid problems when new parameters are added.

* SDK - Components - Added package installation support to func_to_container_op

Example:
```python
op = func_to_container_op(my_func, packages_to_install=['pandas==0.24'])
```

* Make pip quieter

* Added the test_packages_to_install_feature test
2019-09-30 19:13:32 -07:00
Alexey Volkov 06f9322a78 SDK - Lightweight - Convert the names of file inputs and outputs (#2260)
* SDK - Lightweight - Convert the names of file inputs and outputs

Removing the "_path" and "_file" suffixes from the names of file inputs and outputs.
Problem: When accepting file inputs (outputs), the function inside the component receives file paths (or file streams), so it's natural to call the function parameter "something_file_path" (e.g. model_file_path or number_file_path).
But from the outside perspective, there are no files or paths - the actual data objects (or references to them) are passed in.
It looks very strange when argument passing code looks like this: `component(number_file_path=42)`. This looks like an error since 42 is not a path. It's not even a string.
It's much more natural to strip the names of file inputs and outputs of "_file" or "_path" suffixes. Then the argument passing code will look natural: "component(number=42)"

* Removed the _FEATURE_STRIP_FILE_IO_NAME_PARTS feature switch
2019-09-30 16:35:32 -07:00
Alexey Volkov 3caba4e06f SDK - Lightweight - Added support for file outputs (#2221)
Lightweight components now allow function to mark some outputs that it wants to produce by writing data to files, not returning it as in-memory data objects.
This is useful when the data is expected to be big.

Example 1 (writing big amount of data to output file with provided path):
```python
@func_to_container_op
def write_big_data(big_file_path: OutputPath(str)):
    with open(big_file_path) as big_file:
        for i in range(1000000):
            big_file.write('Hello world\n')

```
Example 2 (writing big amount of data to provided output file stream):
```python
@func_to_container_op
def write_big_data(big_file: OutputTextFile(str)):
    for i in range(1000000):
        big_file.write('Hello world\n')
```
2019-09-24 18:11:58 -07:00
Alexey Volkov 2510a690f2 SDK - Lightweight - Added support for file inputs (#2207)
Lightweight components now allow function to mark some inputs that it wants to consume as files, not as in-memory data objects.
This is useful when the data is expected to be big.

Example 1:
```python
def consume_big_file_path(big_file_path: InputPath(str)) -> int:
    line_count = 0
    with open(big_file_path) as f:
        while f.readline():
            line_count = line_count + 1
    return line_count
```
Example 2:
```python
def consume_big_file(big_file: InputTextFile(str)) -> int:
    line_count = 0
    while big_file.readline():
        line_count = line_count + 1
    return line_count
```
2019-09-23 17:59:25 -07:00
Alexey Volkov 1c287f2f89 SDK - Components - Simplified arg-parsing code using argparse.SUPPRESS (#2193) 2019-09-23 13:45:24 -07:00
Alexey Volkov c914df542c SDK - Python components - Properly serializing outputs (#2198)
* SDK - Tests - Added better helper functions for testing python components

* SDK - Python components - Properly serializing outputs
Background:
Component arguments are already properly serialized when calling the component program and then deserialized before the execution of the component function.
But the component outputs were only serialized using `str()` which is inadequate for data types like lists or dictionaries.

This commit fixes the mismatch - theoutputs are now serialized the same ways as arguments and default values.
2019-09-23 12:29:33 -07:00
Alexey Volkov db6625ff96 SDK - Removed some dead code (#2194) 2019-09-23 12:29:25 -07:00
Alexey Volkov c4c0bb8202 SDK - Components - Fixed kfp.components.set_default_base_image (#2118) 2019-09-16 15:30:26 -07:00
Alexey Volkov 647867bde1 SDK - Python components - Fixed the default base_image handling (#2119)
In python the default parameter values are only evaluated once.
2019-09-16 13:42:38 -07:00
Alexey Volkov 77d0ee014e SDK - Lightweigh - Made wrapper code compatible with python2 (#2035) 2019-09-13 16:44:40 -07:00
Alexey Volkov 6c15f27f7e SDK - Components - Hiding signature attribute from CloudPickle (#2045)
* SDK - Components - Hiding signature attribute from CloudPickle

Cloudpickle has some issues with pickling type annotations in python versions < 3.7, so they disabled it. https://github.com/cloudpipe/cloudpickle/issues/196
`create component_from_airflow_op` spoofs the function signature by setting the `func.__signature__` attribute. cloudpickle then tries to pickle that attribute which leads to failures during unpickling.
To prevent this we remove the `.__signature__` attribute before pickling.

* Added comments

        # Hack to prevent cloudpickle from trying to pickle generic types that might be present in the signature. See https://github.com/cloudpipe/cloudpickle/issues/196 
        # Currently the __signature__ is only set by Airflow components as a means to spoof/pass the function signature to _func_to_component_spec
2019-09-06 11:12:15 -07:00
Alexey Volkov 5dbea6cb91 SDK - Components - Setting default base image or image factory (#1937)
Added kfp.components.set_default_base_image which sets the name of the container image that will be used for component creation when base_image is not specified.
Alternatively, the base image can also be set to a factory function that will be returning the image.

The support is added for both Lightweight components and python container components.
2019-08-26 17:48:40 -07:00
Alexey Volkov 17e18a162e SDK - Components - Improved serialization and deserialization of arguments and defaults (#1934)
* SDK - Components - Improved serialization and deserialization of arguments and defaults

Properly serialize default values and passed arguments using the same code.
Check the types of passed argument values and issue warnings.
Improved argument reference type compatibility checking. When types do not match there is always either error or warning.
When creating component from python function, the input types are now canonicalized.

* Addressed the feedback
2019-08-23 18:18:25 -07:00
Alexey Volkov 203307dbaf
SDK - Lightweight - Fixed custom types in multi-output case (#1875)
The type was mistakenly serialized as `_ForwardRef('CustomType')`.
The input parameter types and single-output types were not affected.
2019-08-21 16:37:21 -07:00
Alexey Volkov 7917ea475e SDK - Lightweight - Added support for complex default values (#1696) 2019-08-12 02:35:13 -07:00
Alexey Volkov dd59bc2597 SDK - Lightweight - Fixed regression for components without outputs (#1726) 2019-08-05 21:47:53 -07:00
Alexey Volkov 94969d6264 SDK/Lightweight - Updated default image to tensorflow:1.13.2-py3 (#1671) 2019-08-02 18:31:53 -07:00
Alexey Volkov f6cf9c5f55 SDK - Lightweight - Added support for "None" default values (#1626)
* SDK - Lightweight - Added support for "None" default values
Previously it was impossible to pass None to components since it was being converted to the string "None".

* is_required = not input.optional for now
As asked by @gaoning777
2019-07-25 18:49:59 -07:00
Alexey Volkov 3aeab312f2 SDK/Lightweight - Use argparse for command-line parsing (#1534)
It's required to correctly handle None arguments or None default values (also needed for optional and variable-number inputs).
It's easier to understand and generates better command-line code.
2019-06-23 16:45:53 -07:00
Alexey Volkov ce8df162a9 SDK/Lightweight - Added python version compatibility checks (#1524)
* SDK - Refactored the code in kfp.components._python_op._capture_function_code_using_cloudpickle

* SDK/Lightweight - Added python version compatibility checks

See my compatibility analysis: https://github.com/cloudpipe/cloudpickle/issues/293
2019-06-23 14:41:54 -07:00
Alexey Volkov 627b412f24 SDK/Lightweight - Disabled code pickling by default (#1512)
I've introduced code pickling to capture dependencies in https://github.com/kubeflow/pipelines/pull/1372
Later I've discovered that there is a serious opcode incompatibility between python versions 3.5 and 3.6+. See my analysis of the issue: https://github.com/cloudpipe/cloudpickle/issues/293

Dues to this issue I decided to switch back to using source code copying by default and to continue improving it.

Until we stop supporting python 3.5 (https://github.com/kubeflow/pipelines/pull/668) it's too dangerous to use code pickling by default.

Code pickling can be enabled by specifying `pickle_code=True` when calling `func_to_container_op`
2019-06-18 19:44:30 -07:00
Alexey Volkov b935836c30 SDK/Lightweight - Enable cloudpickle installation from non-root users (#1511) 2019-06-17 18:56:15 -07:00
Alexey Volkov e90085ecb3 SDK - Refactored _func_to_component_spec to split code generation from signature analysis (#1334)
* SDK - Refactored _func_to_component_spec to split out the function signature analyzer

* Renamed function to _extract_component_interface
2019-06-17 18:02:16 -07:00
Alexey Volkov aee1b5e2e5 SDK - Improving python component logs by making stdout and stderr unbuffered (#1510)
Without this the output and error lines can be printed in wrong order and sometimes not printed at all.
2019-06-14 00:20:20 -07:00
Krassimir Valev 8938669d7d Base64 encode the pickled code (#1476)
Due to its nature, Argo will replace any strings it encounters
that are enclosed in double curly braces, which will make the code
non-executable. To workaround this, the code is encoded in the Argo
yaml template and decoded on the fly, before the execution.
2019-06-13 23:30:25 -07:00
Alexey Volkov d724a4b68d SDK - Controlling which modules are captured with Lightweight components (#1435)
* SDK - Controlling which modules are captured with Lightweight components

All func_to_* functions now accept the modules_to_capture parameter: List of module names that will be captured (instead of just referencing) during the dependency scan. By default the func.__module__ is captured.

* Described the behavior more in depth.

* Added a test to check that only dependencies are captured
2019-06-07 18:47:06 -07:00
Alexey Volkov ab97d5708d SDK - Only install cloudpickle if it's not available (#1434)
This makes unit tests much faster.
Also:
Pined the version to 1.1.1.
Made the installation quiet.
2019-06-04 17:57:53 -07:00
Alexey Volkov 16213ba62d SDK - Dynamically installing cloudpickle module (#1429)
Fixes https://github.com/kubeflow/pipelines/issues/1426
2019-06-03 16:45:53 -07:00
Alexey Volkov 9a1d47a185 SDK - Capturing function dependencies when creating lightweight components (#1372)
* Transitively capturing code dependencies
Using cloudpickle.

* Got rid of func_type_declarations_code variable

* Extracted the function code extraction functions

* Improved support for capturing module-level dependencies

* Added test for capturing module-level dependencies

* Removed the _capture_function_code_using_source_copy function
As requested by Ning
2019-05-28 18:18:18 -07:00
Alexey Volkov b61bef04a3 SDK - Renamed ModelBase.from_struct/to_struct to from_dict/to_dict (#1290) 2019-05-07 14:06:35 -07:00
Alexey Volkov 5ab368ac10 Added support for default values to Lightweight python components (#890) 2019-03-01 14:51:18 -08:00
Alexey Volkov 32475bfafb SDK/Components/Python - Improved Python2 compatibility (#718)
Improved Python2 compatibility in Lightweight python components
2019-01-24 14:42:03 -08:00
Alexey Volkov 9b4088626c SDK/Components/Python - Made the typing.NamedTuple import optional (#717)
Now it's only imported if the return type is NamedTuple.
2019-01-23 16:31:13 -08:00
Alexey Volkov 83e9ffe5bc SDK/Components - Reworked the component model structures. (#642)
* Reworked the Component structures.
Rewrote parsing, type checking and serialization code.
Improved the graph component structures.
Added most of the needed k8s structures.
Added model validation (input/output existence etc).
Added task cycle detection and topological sorting to GraphSpec.
All container component tests now work.
Added some graph component tests.

* Fixed incompatibilities with python <3.7

* Added __init__.py to make the Travis tests work.

* Adding kubernetes structures to setup.py

* Addressed PR feedback: Renamed _original_names to _serialized_names

* Addressed PR feedback: Reduced indentation.

* Added descriptions for all component structures.

* Fixed a bug in ComponentSpec._post_init()

* Added documentation for ModelBase class and functions.

* Added __eq__/__ne__ and improved __repr__

* Added ModelBase tests
2019-01-09 15:51:34 -08:00